ArcticDB_demo_snapshots
View in Github | Open in Google ColabSnapshots: how to use them and why they are useful¶
An Introduction to Snapshots¶
In order to understand snapshots we first need to be clear about versions.
In ArcticDB, every time a change is made to a symbol a new version is created. So each symbol has a sequence of versions through time.
In a library there will typically be many symbols with each having many versions.
Suppose we reach a point where we wish to record the current state of the data in the library. This is exactly the purpose of a snapshot.
A snapshot records the current versions of all the symbols in the library (or a custom set of versions, see below)
The data recorded in the snapshot can then be read back using the as_of parameter in the read.
Versions that are part of a snapshot are protected from deletion, even if their symbol is deleted.
Below is a simple example that demonstrates snapshots in action.
Installs and Imports¶
!pip install arcticdb
import pandas as pd
import logging
import arcticdb as adb
Set up ArticDB¶
Note: In this example we delete the library if it exists. That is not normal but we want to make sure we have a clean library in this case.
Don't copy those lines unless you are sure that is what you need.
lib_name = 'demo'
arctic = adb.Arctic("lmdb://arcticdb_snapshot_demo")
if lib_name in arctic.list_libraries():
arctic.delete_library(lib_name)
lib = arctic.get_library('demo', create_if_missing=True)
Create some symbols¶
num_symbols = 4
symbols = [f"sym_{idx}" for idx in range(num_symbols)]
half_symbols = symbols[:num_symbols // 2]
print(symbols)
print(half_symbols)
['sym_0', 'sym_1', 'sym_2', 'sym_3'] ['sym_0', 'sym_1']
# write data for each symbol
for idx, symbol in enumerate(symbols):
lib.write(symbol, pd.DataFrame({"col": [idx]}))
# write data only for the first half of the symbols
for idx, symbol in enumerate(half_symbols):
lib.write(symbol, pd.DataFrame({"col": [idx+10]}))
Create the snapshot¶
The metadata is optional
lib.snapshot("snapshot_0", metadata="this is the core of the demo")
Functions to discover and inspect snapshots¶
# list all snapshots
lib.list_snapshots()
{'snapshot_0': 'this is the core of the demo'}
# list the symbols in a snapshot
lib.list_symbols(snapshot_name="snapshot_0")
['sym_2', 'sym_1', 'sym_0', 'sym_3']
# list the versions in a snapshot
lib.list_versions(snapshot="snapshot_0")
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']),
sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']),
sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']),
sym_0_v1: (date=2023-11-20 10:24:45.413203317+00:00, snapshots=['snapshot_0'])}
# list all versions in the library, with associated snapshots
lib.list_versions()
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']),
sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']),
sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']),
sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00),
sym_0_v1: (date=2023-11-20 10:24:45.413203317+00:00, snapshots=['snapshot_0']),
sym_0_v0: (date=2023-11-20 10:24:45.041944641+00:00)}
Reading a snapshot version of a symbol¶
vit = lib.read("sym_0", as_of="snapshot_0")
print(vit)
print(vit.data)
VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 10
vit = lib.read("sym_3", as_of="snapshot_0")
print(vit)
print(vit.data)
VersionedItem(symbol='sym_3', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=0, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 3
Demonstration that snapshot versions are protected from deletion¶
# delete the symbol sym_0
lib.delete("sym_0")
# show that sym_0 has been deleted
lib.list_symbols()
['sym_2', 'sym_1', 'sym_3']
# sym_0 does not appear in the current library versions
lib.list_versions()
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']),
sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']),
sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']),
sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
# however we can still read the version of sym_0 that was recorded in the snapshot
vit = lib.read("sym_0", as_of="snapshot_0")
print(vit)
print(vit.data)
VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 10
Deleting a snapshot¶
When we delete a snapshot, any versions that are only referenced by that snapshot will be deleted.
lib.delete_snapshot("snapshot_0")
lib.list_snapshots()
{}
# version 1, which was kept as part of the snapshot, has now been deleted
try:
vit = lib.read("sym_0", as_of=1)
print(vit)
print(vit.data)
except adb.exceptions.NoSuchVersionException:
logging.error("Version not found")
ERROR:root:Version not found
lib.list_versions()
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00),
sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00),
sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00),
sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
Snapshot names must be unique¶
Creating a snapshot with a name that already has a snapshot causes an error.
lib.snapshot("snapshot_1", metadata="demo snapshot names need to be unique")
try:
lib.snapshot("snapshot_1")
except Exception as e:
logging.error(e)
ERROR:root:E_ASSERTION_FAILURE Snapshot with name snapshot_1 already exists
lib.list_snapshots()
{'snapshot_1': 'demo snapshot names need to be unique'}
Modifiers for snapshot creation: exclude or include symbols¶
# exclude sym_1 from snapshot
lib.snapshot("snapshot_2", skip_symbols=["sym_1"], metadata="demo skip_symbols")
lib.list_versions()
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_1', 'snapshot_2']),
sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_1', 'snapshot_2']),
sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_1']),
sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
# include specific versions of sym_1 and sym_2 from snapshot
lib.snapshot("snapshot_3", versions={"sym_1": 0, "sym_2": 0}, metadata="demo versions")
lib.list_versions(snapshot="snapshot_3")
{sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_1', 'snapshot_2', 'snapshot_3']),
sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00, snapshots=['snapshot_3'])}
lib.list_snapshots()
{'snapshot_1': 'demo snapshot names need to be unique',
'snapshot_2': 'demo skip_symbols',
'snapshot_3': 'demo versions'}
Snapshots: why and why not to use them¶
Why¶
- Snapshots record the current state of the library
- They can be thought of as recoverable checkpoints in the evolution of the data
- Snapshots can create an audit trail
- Snapshots protect their data from deletion by other activity in the library
Why Not¶
- Generally we encourage the use of snapshots
- However if many snapshots are created they can impose a slight performance penalty on some operations due to the deletion protection
- Snapshots can also increase the storage used by ArcticDB, through protecting older versions that would otherwise be deleted
- Use snapshots in a considered fashion and delete them when they are no longer needed
Further Info / Extras¶
For full descriptions of the functions used above, please see the ArcticDb documentation:
snapshot()https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.snapshotlist_snapshots()https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.list_snapshotslist_versions()https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.list_versions