ArcticDB_demo_snapshots
View in Github | Open in Google ColabSnapshots: how to use them and why they are useful
An Introduction to Snapshots¶
In order to understand snapshots we first need to be clear about versions.
In ArcticDB, every time a change is made to a symbol a new version is created. So each symbol has a sequence of versions through time.
In a library there will typically be many symbols with each having many versions.
Suppose we reach a point where we wish to record the current state of the data in the library. This is exactly the purpose of a snapshot.
A snapshot records the current versions of all the symbols in the library (or a custom set of versions, see below)
The data recorded in the snapshot can then be read back using the as_of
parameter in the read.
Versions that are part of a snapshot are protected from deletion, even if their symbol is deleted.
Below is a simple example that demonstrates snapshots in action.
Installs and Imports¶
!pip install arcticdb
import pandas as pd
import logging
import arcticdb as adb
Set up ArticDB¶
Note: In this example we delete the library if it exists. That is not normal but we want to make sure we have a clean libary in this case.
Don't copy those lines unless you are sure that is what you need.
lib_name = 'demo'
arctic = adb.Arctic("lmdb://arcticdb_snapshot_demo")
if lib_name in arctic.list_libraries():
arctic.delete_library(lib_name)
lib = arctic.get_library('demo', create_if_missing=True)
Create some symbols¶
num_symbols = 4
symbols = [f"sym_{idx}" for idx in range(num_symbols)]
half_symbols = symbols[:num_symbols // 2]
print(symbols)
print(half_symbols)
['sym_0', 'sym_1', 'sym_2', 'sym_3'] ['sym_0', 'sym_1']
# write data for each symbol
for idx, symbol in enumerate(symbols):
lib.write(symbol, pd.DataFrame({"col": [idx]}))
# write data only for the first half of the symbols
for idx, symbol in enumerate(half_symbols):
lib.write(symbol, pd.DataFrame({"col": [idx+10]}))
Create the snapshot¶
The metadata is optional
lib.snapshot("snapshot_0", metadata="this is the core of the demo")
Functions to discover and inspect snapshots¶
# list all snapshots
lib.list_snapshots()
{'snapshot_0': 'this is the core of the demo'}
# list the symbols in a snapshot
lib.list_symbols(snapshot_name="snapshot_0")
['sym_2', 'sym_1', 'sym_0', 'sym_3']
# list the versions in a snapshot
lib.list_versions(snapshot="snapshot_0")
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']), sym_0_v1: (date=2023-11-20 10:24:45.413203317+00:00, snapshots=['snapshot_0'])}
# list all versions in the library, with associated snapshots
lib.list_versions()
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00), sym_0_v1: (date=2023-11-20 10:24:45.413203317+00:00, snapshots=['snapshot_0']), sym_0_v0: (date=2023-11-20 10:24:45.041944641+00:00)}
Reading a snapshot version of a symbol¶
vit = lib.read("sym_0", as_of="snapshot_0")
print(vit)
print(vit.data)
VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 10
vit = lib.read("sym_3", as_of="snapshot_0")
print(vit)
print(vit.data)
VersionedItem(symbol='sym_3', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=0, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 3
Demonstration that snapshot versions are protected from deletion¶
# delete the symbol sym_0
lib.delete("sym_0")
# show that sym_0 has been deleted
lib.list_symbols()
['sym_2', 'sym_1', 'sym_3']
# sym_0 does not appear in the current library versions
lib.list_versions()
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
# however we can still read the version of sym_0 that was recorded in the snapshot
vit = lib.read("sym_0", as_of="snapshot_0")
print(vit)
print(vit.data)
VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 10
Although it works, we advise not to read snapshot versions directly using the version number¶
These versions only exist because they are in a snapshot, so it is much more obvious to code to access them via the snapshot.
Accessing snapshot protected versions via the version number leads to code that will fail (if the snapshot is deleted) in a way that is difficult to understand.
vit = lib.read("sym_0", as_of=1)
print(vit)
print(vit.data)
VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 10
# version 0 was not in the snapshot, so it has been removed
try:
vit = lib.read("sym_0", as_of=0)
print(vit)
print(vit.data)
except adb.exceptions.NoSuchVersionException:
logging.error("Version not found")
ERROR:root:Version not found
Deleting a snapshot¶
When we delete a snapshot, any versions that are only referenced by that snapshot will be deleted.
lib.delete_snapshot("snapshot_0")
lib.list_snapshots()
{}
# version 1, which was kept as part of the snapshot, has now been deleted
try:
vit = lib.read("sym_0", as_of=1)
print(vit)
print(vit.data)
except adb.exceptions.NoSuchVersionException:
logging.error("Version not found")
ERROR:root:Version not found
lib.list_versions()
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
Snapshot names must be unique¶
Creating a snapshot with a name that already has a snapshot causes an error.
lib.snapshot("snapshot_1", metadata="demo snapshot names need to be unique")
try:
lib.snapshot("snapshot_1")
except Exception as e:
logging.error(e)
ERROR:root:E_ASSERTION_FAILURE Snapshot with name snapshot_1 already exists
lib.list_snapshots()
{'snapshot_1': 'demo snapshot names need to be unique'}
Modifiers for snapshot creation: exclude or include symbols¶
# exclude sym_1 from snapshot
lib.snapshot("snapshot_2", skip_symbols=["sym_1"], metadata="demo skip_symbols")
lib.list_versions()
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_1', 'snapshot_2']), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_1', 'snapshot_2']), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_1']), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
# include specific versions of sym_1 and sym_2 from snapshot
lib.snapshot("snapshot_3", versions={"sym_1": 0, "sym_2": 0}, metadata="demo versions")
lib.list_versions(snapshot="snapshot_3")
{sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_1', 'snapshot_2', 'snapshot_3']), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00, snapshots=['snapshot_3'])}
lib.list_snapshots()
{'snapshot_1': 'demo snapshot names need to be unique', 'snapshot_2': 'demo skip_symbols', 'snapshot_3': 'demo versions'}
Snapshots: why and why not to use them¶
Why¶
- Snapshots record the current state of the library
- They can be thought of a recoverable checkpoints in the evolution of the data
- Snapshots can create an audit trail
- Snapshots protect their data from deletion by other acticity in the library
Why Not¶
- Generally we encourage the use of snapshots
- However if many snapshots are created they can impose a slight performance penalty on some operations due to the deletion protection
- Snapshots can also increase the storage used by ArcticDB, through protecting older versions that would otherwise be deleted
- Use snapshots in a considered fashion and delete them when they are no longer needed
Further Info / Extras¶
For full descriptions of the functions used above, please see the ArcticDb documentation:
snapshot()
https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.snapshotlist_snapshots()
https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.list_snapshotslist_versions()
https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.list_versions