Library API¶
This page documents the arcticdb.version_store.library
module. This module is the main interface
exposing read/write functionality within a given Arctic instance.
The key functionality is exposed through arcticdb.version_store.library.Library
instances. See the
Arctic API section for notes on how to create these. The other types exposed in this module are less
important and are used as part of the signature of arcticdb.version_store.library.Library
instance
methods.
arcticdb.version_store.library.Library ¶
The main interface exposing read/write functionality within a given Arctic instance.
Arctic libraries contain named symbols which are the atomic unit of data storage within Arctic. Symbols contain data that in most cases resembles a DataFrame and are versioned such that all modifying operations can be tracked and reverted.
Instances of this class provide a number of primitives to write, modify and remove symbols, as well as
also providing methods to manage library snapshots. For more information on snapshots please see the snapshot
method.
Arctic libraries support concurrent writes and reads to multiple symbols as well as concurrent reads to a single symbol. However, concurrent writers to a single symbol are not supported other than for primitives that explicitly state support for single-symbol concurrent writes.
METHOD | DESCRIPTION |
---|---|
append |
Appends the given data to the existing, stored data. Append always appends along the index. A new version will |
append_batch |
Append data to multiple symbols in a batch fashion. This is more efficient than making multiple |
compact_symbol_list |
Compact the symbol list cache into a single key in the storage |
defragment_symbol_data |
Compacts fragmented segments by merging row-sliced segments (https://docs.arcticdb.io/technical/on_disk_storage/#data-layer). |
delete |
Delete all versions of the symbol from the library, unless |
delete_data_in_range |
Delete data within the given date range, creating a new version of |
delete_snapshot |
Delete a named snapshot. This may take time if the given snapshot is the last reference to the underlying |
delete_staged_data |
Removes staged data. |
enterprise_options |
Enterprise library options set on this library. See also |
finalize_staged_data |
Finalizes staged data, making it available for reads. All staged segments must be ordered and non-overlapping. |
get_description |
Returns descriptive data for |
get_description_batch |
Returns descriptive data for a list of |
get_staged_symbols |
Returns all symbols with staged, unfinalized data. |
has_symbol |
Whether this library contains the given symbol. |
head |
Read the first n rows of data for the named symbol. If n is negative, return all rows except the last n rows. |
is_symbol_fragmented |
Check whether the number of segments that would be reduced by compaction is more than or equal to the |
list_snapshots |
List the snapshots in the library. |
list_symbols |
Return the symbols in this library. |
list_versions |
Get the versions in this library, filtered by the passed in parameters. |
options |
Library options set on this library. See also |
prune_previous_versions |
Removes all (non-snapshotted) versions from the database for the given symbol, except the latest. |
read |
Read data for the named symbol. Returns a VersionedItem object with a data and metadata element (as passed into |
read_batch |
Reads multiple symbols. |
read_metadata |
Return the metadata saved for a symbol. This method is faster than read as it only loads the metadata, not the |
read_metadata_batch |
Reads the metadata of multiple symbols. |
reload_symbol_list |
Forces the symbol list cache to be reloaded. |
snapshot |
Creates a named snapshot of the data within a library. |
sort_and_finalize_staged_data |
Sorts and merges all staged data, making it available for reads. This differs from |
stage |
Write a staged data chunk to storage, that will not be visible until finalize_staged_data is called on |
tail |
Read the last n rows of data for the named symbol. If n is negative, return all rows except the first n rows. |
update |
Overwrites existing symbol data with the contents of |
write |
Write |
write_batch |
Write a batch of multiple symbols. |
write_metadata |
Write metadata under the specified symbol name to this library. The data will remain unchanged. |
write_metadata_batch |
Write metadata to multiple symbols in a batch fashion. This is more efficient than making multiple |
write_pickle |
See |
write_pickle_batch |
Write a batch of multiple symbols, pickling their data if necessary. |
ATTRIBUTE | DESCRIPTION |
---|---|
name |
The name of this library.
|
append ¶
append(
symbol: str,
data: NormalizableType,
metadata: Any = None,
prune_previous_versions: bool = False,
validate_index: bool = True,
) -> Optional[VersionedItem]
Appends the given data to the existing, stored data. Append always appends along the index. A new version will be created to reference the newly-appended data. Append only accepts data for which the index of the first row is equal to or greater than the index of the last row in the existing data.
Appends containing differing column sets to the existing data are only possible if the library has been configured to support dynamic schemas.
If append
is called on a symbol that does not exist, it will create it. This is convenient when setting up
a new symbol, but be careful - it will not work for creating a new version of an existing symbol. Use write
in that case.
Note that append
is not designed for multiple concurrent writers over a single symbol.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
data
|
Data to be written.
TYPE:
|
metadata
|
Optional metadata to persist along with the new symbol version. Note that the metadata is not combined in any way with the metadata stored in the previous version.
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
default
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
validate_index
|
If True, verify that the index of
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
Structure containing metadata and version number of the written symbol in the store. |
RAISES | DESCRIPTION |
---|---|
UnsortedDataException
|
If data is unsorted, when validate_index is set to True. |
Examples:
>>> df = pd.DataFrame(
... {'column': [1,2,3]},
... index=pd.date_range(start='1/1/2018', end='1/03/2018')
... )
>>> df
column
2018-01-01 1
2018-01-02 2
2018-01-03 3
>>> lib.write("symbol", df)
>>> to_append_df = pd.DataFrame(
... {'column': [4,5,6]},
... index=pd.date_range(start='1/4/2018', end='1/06/2018')
... )
>>> to_append_df
column
2018-01-04 4
2018-01-05 5
2018-01-06 6
>>> lib.append("symbol", to_append_df)
>>> lib.read("symbol").data
column
2018-01-01 1
2018-01-02 2
2018-01-03 3
2018-01-04 4
2018-01-05 5
2018-01-06 6
append_batch ¶
append_batch(
append_payloads: List[WritePayload],
prune_previous_versions: bool = False,
validate_index=True,
) -> List[Union[VersionedItem, DataError]]
Append data to multiple symbols in a batch fashion. This is more efficient than making multiple append
calls in
succession as some constant-time operations can be executed only once rather than once for each element of
append_payloads
.
Note that this isn't an atomic operation - it's possible for one symbol to be fully written and readable before
another symbol.
PARAMETER | DESCRIPTION |
---|---|
append_payloads
|
Symbols and their corresponding data. There must not be any duplicate symbols in
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
validate_index
|
Verify that each entry in the batch has an index that supports date range searches and update operations. This tests that the data is sorted in ascending order, using Pandas DataFrame.index.is_monotonic_increasing.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
List[Union[VersionedItem, DataError]]
|
List of versioned items. i-th entry corresponds to i-th element of |
RAISES | DESCRIPTION |
---|---|
ArcticDuplicateSymbolsInBatchException
|
When duplicate symbols appear in payload. |
ArcticUnsupportedDataTypeException
|
If data that is not of NormalizableType appears in any of the payloads. |
compact_symbol_list ¶
compact_symbol_list() -> None
Compact the symbol list cache into a single key in the storage
RETURNS | DESCRIPTION |
---|---|
The number of symbol list keys prior to compaction
|
|
RAISES | DESCRIPTION |
---|---|
PermissionException
|
Library has been opened in read-only mode |
InternalException
|
Storage lock required to compact the symbol list could not be acquired |
defragment_symbol_data ¶
defragment_symbol_data(
symbol: str,
segment_size: Optional[int] = None,
prune_previous_versions: bool = False,
) -> VersionedItem
Compacts fragmented segments by merging row-sliced segments (https://docs.arcticdb.io/technical/on_disk_storage/#data-layer).
This method calls is_symbol_fragmented
to determine whether to proceed with the defragmentation operation.
CAUTION - Please note that a major restriction of this method at present is that any column slicing present on the data will be
removed in the new version created as a result of this method.
As a result, if the impacted symbol has more than 127 columns (default value), the performance of selecting individual columns of
the symbol (by using the columns
parameter) may be negatively impacted in the defragmented version.
If your symbol has less than 127 columns this caveat does not apply.
For more information, please see columns_per_segment
here:
https://docs.arcticdb.io/api/arcticdb/arcticdb.LibraryOptions
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
segment_size
|
Target for maximum no. of rows per segment, after compaction. If parameter is not provided, library option - "segment_row_size" will be used Note that no. of rows per segment, after compaction, may exceed the target. It is for achieving smallest no. of segment after compaction. Please refer to below example for further explanation.
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
Structure containing metadata and version number of the defragmented symbol in the store. |
RAISES | DESCRIPTION |
---|---|
1002 ErrorCategory.INTERNAL:E_ASSERTION_FAILURE
|
If |
2001 ErrorCategory.NORMALIZATION:E_UNIMPLEMENTED_INPUT_TYPE
|
If library option - "bucketize_dynamic" is ON |
Examples:
>>> lib.write("symbol", pd.DataFrame({"A": [0]}, index=[pd.Timestamp(0)]))
>>> lib.append("symbol", pd.DataFrame({"A": [1, 2]}, index=[pd.Timestamp(1), pd.Timestamp(2)]))
>>> lib.append("symbol", pd.DataFrame({"A": [3]}, index=[pd.Timestamp(3)]))
>>> lib_tool = lib._dev_tools.library_tool()
>>> lib_tool.read_index(sym)
start_index end_index version_id stream_id creation_ts content_hash index_type key_type start_col end_col start_row end_row
1970-01-01 00:00:00.000000000 1970-01-01 00:00:00.000000001 20 b'sym' 1678974096622685727 6872717287607530038 84 2 1 2 0 1
1970-01-01 00:00:00.000000001 1970-01-01 00:00:00.000000003 21 b'sym' 1678974096931527858 12345256156783683504 84 2 1 2 1 3
1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000004 22 b'sym' 1678974096970045987 7952936283266921920 84 2 1 2 3 4
>>> lib.version_store.defragment_symbol_data("symbol", 2)
>>> lib_tool.read_index(sym) # Returns two segments rather than three as a result of the defragmentation operation
start_index end_index version_id stream_id creation_ts content_hash index_type key_type start_col end_col start_row end_row
1970-01-01 00:00:00.000000000 1970-01-01 00:00:00.000000003 23 b'sym' 1678974097067271451 5576804837479525884 84 2 1 2 0 3
1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000004 23 b'sym' 1678974097067427062 7952936283266921920 84 2 1 2 3 4
Notes
Config map setting - SymbolDataCompact.SegmentCount will be replaced by a library setting in the future. This API will allow overriding the setting as well.
delete ¶
delete(
symbol: str,
versions: Optional[Union[int, Iterable[int]]] = None,
)
Delete all versions of the symbol from the library, unless version
is specified, in which case only those
versions are deleted.
This may not actually delete the underlying data if a snapshot still references the version. See snapshot
for
more detail.
Note that this may require data to be removed from the underlying storage which can be slow.
This method does not remove any staged data, use delete_staged_data
for that.
If no symbol called symbol
exists then this is a no-op. In particular this method does not raise in this case.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol to delete.
TYPE:
|
versions
|
Version or versions of symbol to delete. If
TYPE:
|
delete_data_in_range ¶
delete_data_in_range(
symbol: str,
date_range: Tuple[
Optional[Timestamp], Optional[Timestamp]
],
prune_previous_versions: bool = False,
)
Delete data within the given date range, creating a new version of symbol
.
The existing symbol version must be timeseries-indexed.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
date_range
|
The date range in which to delete data. Leaving any part of the tuple as None leaves that part of the range open ended.
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
Examples:
>>> df = pd.DataFrame({"column": [5, 6, 7, 8]}, index=pd.date_range(start="1/1/2018", end="1/4/2018"))
>>> lib.write("symbol", df)
>>> lib.delete_data_in_range("symbol", date_range=(datetime.datetime(2018, 1, 1), datetime.datetime(2018, 1, 2)))
>>> lib["symbol"].version
1
>>> lib["symbol"].data
column
2018-01-03 7
2018-01-04 8
delete_snapshot ¶
delete_snapshot(snapshot_name: str) -> None
Delete a named snapshot. This may take time if the given snapshot is the last reference to the underlying symbol(s) as the underlying data will be removed as well.
PARAMETER | DESCRIPTION |
---|---|
snapshot_name
|
The snapshot name to delete.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
Exception
|
If the named snapshot does not exist. |
delete_staged_data ¶
delete_staged_data(symbol: str)
Removes staged data.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol to remove staged data for.
TYPE:
|
See Also
write
Documentation on the staged
parameter explains the concept of staged data in more detail.
enterprise_options ¶
enterprise_options() -> EnterpriseLibraryOptions
Enterprise library options set on this library. See also options
for non-enterprise options.
finalize_staged_data ¶
finalize_staged_data(
symbol: str,
mode: Optional[StagedDataFinalizeMethod] = WRITE,
prune_previous_versions: bool = False,
metadata: Any = None,
validate_index=True,
delete_staged_data_on_failure: bool = False,
) -> VersionedItem
Finalizes staged data, making it available for reads. All staged segments must be ordered and non-overlapping.
finalize_staged_data
is less time consuming than sort_and_finalize_staged_data
.
If mode
is StagedDataFinalizeMethod.APPEND
the index of the first row of the new segment must be equal to or greater
than the index of the last row in the existing data.
If Static Schema
is used all staged block must have matching schema (same column names, same dtype, same column ordering)
and must match the existing data if mode is StagedDataFinalizeMethod.APPEND
. For more information about schema options see
documentation for arcticdb.LibraryOptions.dynamic_schema
If the symbol does not exist both StagedDataFinalizeMethod.APPEND
and StagedDataFinalizeMethod.WRITE
will create it.
Calling finalize_staged_data
without having staged data for the symbol will throw UserInputException
. Use
get_staged_symbols
to check if there are staged segments for the symbol.
Calling finalize_staged_data
if any of the staged segments contains NaT in its index will throw SortingException
.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol to finalize data for.
TYPE:
|
mode
|
Finalize mode. Valid options are WRITE or APPEND. Write collects the staged data and writes them to a new version. Append collects the staged data and appends them to the latest version.
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
metadata
|
Optional metadata to persist along with the symbol.
TYPE:
|
validate_index
|
If True, and staged segments are timeseries, will verify that the index of the symbol after this operation
supports date range searches and update operations. This requires that the indexes of the staged segments
are non-overlapping with each other, and, in the case of
DEFAULT:
|
delete_staged_data_on_failure
|
Determines the handling of staged data when an exception occurs during the execution of the
To manually delete staged data, use the
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
Structure containing metadata and version number of the written symbol in the store. The data member will be None. |
RAISES | DESCRIPTION |
---|---|
SortingException
|
|
UserInputException
|
|
SchemaException
|
|
See Also
write
Documentation on the staged
parameter explains the concept of staged data in more detail.
Examples:
>>> lib.write("sym", pd.DataFrame({"col": [3, 4]}, index=pd.DatetimeIndex([pd.Timestamp(2024, 1, 3), pd.Timestamp(2024, 1, 4)])), staged=True)
>>> lib.write("sym", pd.DataFrame({"col": [1, 2]}, index=pd.DatetimeIndex([pd.Timestamp(2024, 1, 1), pd.Timestamp(2024, 1, 2)])), staged=True)
>>> lib.finalize_staged_data("sym")
>>> lib.read("sym").data
col
2024-01-01 1
2024-01-02 2
2024-01-03 3
2024-01-04 4
get_description ¶
get_description(
symbol: str, as_of: Optional[AsOf] = None
) -> SymbolDescription
Returns descriptive data for symbol
.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
as_of
|
See documentation on
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
SymbolDescription
|
Named tuple containing the descriptive data. |
See Also
SymbolDescription For documentation on each field.
get_description_batch ¶
get_description_batch(
symbols: List[Union[str, ReadInfoRequest]]
) -> List[Union[SymbolDescription, DataError]]
Returns descriptive data for a list of symbols
.
PARAMETER | DESCRIPTION |
---|---|
symbols
|
List of symbols to read.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Union[SymbolDescription, DataError]]
|
A list of the descriptive data, whose i-th element corresponds to the i-th element of the |
See Also
SymbolDescription For documentation on each field.
get_staged_symbols ¶
get_staged_symbols() -> List[str]
Returns all symbols with staged, unfinalized data.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
Symbol names. |
See Also
write
Documentation on the staged
parameter explains the concept of staged data in more detail.
has_symbol ¶
has_symbol(
symbol: str, as_of: Optional[AsOf] = None
) -> bool
Whether this library contains the given symbol.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name for the item
TYPE:
|
as_of
|
Return the data as it was as_of the point in time. See
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the symbol is in the library, False otherwise. |
Examples:
>>> lib.write("symbol", pd.DataFrame())
>>> lib.has_symbol("symbol")
True
>>> lib.has_symbol("another_symbol")
False
The contains operator also checks whether a symbol exists in this library as of now:
>>> "symbol" in lib
True
>>> "another_symbol" in lib
False
head ¶
head(
symbol: str,
n: int = 5,
as_of: Optional[AsOf] = None,
columns: List[str] = None,
lazy: bool = False,
) -> Union[VersionedItem, LazyDataFrame]
Read the first n rows of data for the named symbol. If n is negative, return all rows except the last n rows.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
n
|
Number of rows to select if non-negative, otherwise number of rows to exclude.
TYPE:
|
as_of
|
See documentation on
TYPE:
|
columns
|
See documentation on
TYPE:
|
lazy
|
See documentation on
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[VersionedItem, LazyDataFrame]
|
If lazy is False, VersionedItem object that contains a .data and .metadata element. If lazy is True, a LazyDataFrame object on which further querying can be performed prior to collect. |
is_symbol_fragmented ¶
is_symbol_fragmented(
symbol: str, segment_size: Optional[int] = None
) -> bool
Check whether the number of segments that would be reduced by compaction is more than or equal to the value specified by the configuration option "SymbolDataCompact.SegmentCount" (defaults to 100).
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
segment_size
|
Target for maximum no. of rows per segment, after compaction. If parameter is not provided, library option for segments's maximum row size will be used
TYPE:
|
Notes
Config map setting - SymbolDataCompact.SegmentCount will be replaced by a library setting in the future. This API will allow overriding the setting as well.
RETURNS | DESCRIPTION |
---|---|
bool
|
|
list_snapshots ¶
list_snapshots(
load_metadata: Optional[bool] = True,
) -> Union[List[str], Dict[str, Any]]
List the snapshots in the library.
PARAMETER | DESCRIPTION |
---|---|
load_metadata
|
Load the snapshot metadata. May be slow so opt for false if you don't need it.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[List[str], Dict[str, Any]]
|
Snapshots in the library. Returns a list of snapshot names if load_metadata is False, otherwise returns a dictionary where keys are snapshot names and values are metadata associated with that snapshot. |
list_symbols ¶
list_symbols(
snapshot_name: Optional[str] = None,
regex: Optional[str] = None,
) -> List[str]
Return the symbols in this library.
PARAMETER | DESCRIPTION |
---|---|
regex
|
If passed, returns only the symbols which match the regex.
TYPE:
|
snapshot_name
|
Return the symbols available under the snapshot. If None then considers symbols that are live in the library as of the current time.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[str]
|
Symbols in the library. |
list_versions ¶
list_versions(
symbol: Optional[str] = None,
snapshot: Optional[str] = None,
latest_only: bool = False,
skip_snapshots: bool = False,
) -> Dict[SymbolVersion, VersionInfo]
Get the versions in this library, filtered by the passed in parameters.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol to return versions for. If None returns versions across all symbols in the library.
TYPE:
|
snapshot
|
Only return the versions contained in the named snapshot.
TYPE:
|
latest_only
|
Only include the latest version for each returned symbol.
TYPE:
|
skip_snapshots
|
Don't populate version list with snapshot information. Can improve performance significantly if there are many snapshots.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[SymbolVersion, VersionInfo]
|
Dictionary describing the version for each symbol-version pair in the library. Since symbol version is a (named) tuple you can index in to the dictionary simply as shown in the examples below. |
Examples:
>>> df = pd.DataFrame()
>>> lib.write("symbol", df, metadata=10)
>>> lib.write("symbol", df, metadata=11, prune_previous_versions=False)
>>> lib.snapshot("snapshot")
>>> lib.write("symbol", df, metadata=12, prune_previous_versions=False)
>>> lib.delete("symbol", versions=(1, 2))
>>> versions = lib.list_versions("symbol")
>>> versions["symbol", 1].deleted
True
>>> versions["symbol", 1].snapshots
["my_snap"]
options ¶
options() -> LibraryOptions
Library options set on this library. See also enterprise_options
.
prune_previous_versions ¶
prune_previous_versions(symbol)
Removes all (non-snapshotted) versions from the database for the given symbol, except the latest.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name to prune.
TYPE:
|
read ¶
read(
symbol: str,
as_of: Optional[AsOf] = None,
date_range: Optional[
Tuple[Optional[Timestamp], Optional[Timestamp]]
] = None,
row_range: Optional[Tuple[int, int]] = None,
columns: Optional[List[str]] = None,
query_builder: Optional[QueryBuilder] = None,
lazy: bool = False,
) -> Union[VersionedItem, LazyDataFrame]
Read data for the named symbol. Returns a VersionedItem object with a data and metadata element (as passed into write).
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
as_of
|
Return the data as it was as of the point in time.
TYPE:
|
date_range
|
DateRange to restrict read data to. Applicable only for time-indexed Pandas dataframes or series. Returns only the
part of the data that falls withing the given range (inclusive). None on either end leaves that part of the
range open-ended. Hence specifying Only one of date_range or row_range can be provided.
TYPE:
|
row_range
|
Row range to read data for. Inclusive of the lower bound, exclusive of the upper bound lib.read(symbol, row_range=(start, end)).data should behave the same as df.iloc[start:end], including in the handling of negative start/end values. Only one of date_range or row_range can be provided.
TYPE:
|
columns
|
Applicable only for Pandas data. Determines which columns to return data for. Special values:
-
TYPE:
|
query_builder
|
A QueryBuilder object to apply to the dataframe before it is returned. For more information see the
documentation for the QueryBuilder class (
TYPE:
|
lazy
|
Defer query execution until
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[VersionedItem, LazyDataFrame]
|
If lazy is False, VersionedItem object that contains a .data and .metadata element. If lazy is True, a LazyDataFrame object on which further querying can be performed prior to collect. |
Examples:
>>> df = pd.DataFrame({'column': [5,6,7]})
>>> lib.write("symbol", df, metadata={'my_dictionary': 'is_great'})
>>> lib.read("symbol").data
column
0 5
1 6
2 7
The default read behaviour is also available through subscripting:
>>> lib["symbol"].data
column
0 5
1 6
2 7
read_batch ¶
read_batch(
symbols: List[Union[str, ReadRequest]],
query_builder: Optional[QueryBuilder] = None,
lazy: bool = False,
) -> Union[
List[Union[VersionedItem, DataError]],
LazyDataFrameCollection,
]
Reads multiple symbols.
PARAMETER | DESCRIPTION |
---|---|
symbols
|
List of symbols to read.
TYPE:
|
query_builder
|
A single QueryBuilder to apply to all the dataframes before they are returned. If this argument is passed
then none of the
TYPE:
|
lazy
|
Defer query execution until
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[List[Union[VersionedItem, DataError]], LazyDataFrameCollection]
|
If lazy is False:
A list of the read results, whose i-th element corresponds to the i-th element of the |
RAISES | DESCRIPTION |
---|---|
ArcticInvalidApiUsageException
|
If kwarg query_builder and per-symbol query builders both used. |
Examples:
>>> lib.write("s1", pd.DataFrame())
>>> lib.write("s2", pd.DataFrame({"col": [1, 2, 3]}))
>>> lib.write("s2", pd.DataFrame(), prune_previous_versions=False)
>>> lib.write("s3", pd.DataFrame())
>>> batch = lib.read_batch(["s1", adb.ReadRequest("s2", as_of=0), "s3", adb.ReadRequest("s2", as_of=1000)])
>>> batch[0].data.empty
True
>>> batch[1].data.empty
False
>>> batch[2].data.empty
True
>>> batch[3].symbol
"s2"
>>> isinstance(batch[3], adb.DataError)
True
>>> batch[3].version_request_type
VersionRequestType.SPECIFIC
>>> batch[3].version_request_data
1000
>>> batch[3].error_code
ErrorCode.E_NO_SUCH_VERSION
>>> batch[3].error_category
ErrorCategory.MISSING_DATA
See Also
read
read_metadata ¶
read_metadata(
symbol: str, as_of: Optional[AsOf] = None
) -> VersionedItem
Return the metadata saved for a symbol. This method is faster than read as it only loads the metadata, not the data itself.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name
TYPE:
|
as_of
|
Return the metadata as it was as of the point in time. See documentation on
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
Structure containing metadata and version number of the affected symbol in the store. The data attribute will be None. |
read_metadata_batch ¶
read_metadata_batch(
symbols: List[Union[str, ReadInfoRequest]]
) -> List[Union[VersionedItem, DataError]]
Reads the metadata of multiple symbols.
PARAMETER | DESCRIPTION |
---|---|
symbols
|
List of symbols to read metadata.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Union[VersionedItem, DataError]]
|
A list of the read metadata results, whose i-th element corresponds to the i-th element of the |
See Also
read_metadata
reload_symbol_list ¶
reload_symbol_list()
Forces the symbol list cache to be reloaded.
This can take a long time on large libraries or certain S3 implementations, and once started, it cannot be safely interrupted. If the call is interrupted somehow (exception/process killed), please call this again ASAP.
snapshot ¶
snapshot(
snapshot_name: str,
metadata: Any = None,
skip_symbols: Optional[List[str]] = None,
versions: Optional[Dict[str, int]] = None,
) -> None
Creates a named snapshot of the data within a library.
By default, the latest version of every symbol that has not been deleted will be contained within the snapshot.
You can change this behaviour with either versions
(an allow-list) or with skip_symbols
(a deny-list).
Concurrent writes with prune previous versions set while the snapshot is being taken can potentially lead to
corruption of the affected symbols in the snapshot.
The symbols and versions contained within the snapshot will persist regardless of new symbols and versions being written to the library afterwards. If a version or symbol referenced in a snapshot is deleted then the underlying data will be preserved to ensure the snapshot is still accessible. Only once all referencing snapshots have been removed will the underlying data be removed as well.
At most one of skip_symbols
and versions
may be truthy.
PARAMETER | DESCRIPTION |
---|---|
snapshot_name
|
Name of the snapshot.
TYPE:
|
metadata
|
Optional metadata to persist along with the snapshot.
TYPE:
|
skip_symbols
|
Optional symbols to be excluded from the snapshot.
TYPE:
|
versions
|
Optional dictionary of versions of symbols to snapshot. For example
TYPE:
|
RAISES | DESCRIPTION |
---|---|
InternalException
|
If a snapshot already exists with |
MissingDataException
|
If a symbol or the version of symbol specified in versions does not exist or has been deleted in the library, or, the library has no symbol. |
sort_and_finalize_staged_data ¶
sort_and_finalize_staged_data(
symbol: str,
mode: Optional[StagedDataFinalizeMethod] = WRITE,
prune_previous_versions: bool = False,
metadata: Any = None,
delete_staged_data_on_failure: bool = False,
) -> VersionedItem
Sorts and merges all staged data, making it available for reads. This differs from finalize_staged_data
in that it
can support staged segments with interleaved time periods and staged segments which are not internally sorted. The
end result will be sorted. This requires performing a full sort in memory so can be time consuming.
If mode
is StagedDataFinalizeMethod.APPEND
the index of the first row of the sorted block must be equal to or greater
than the index of the last row in the existing data.
If Static Schema
is used all staged block must have matching schema (same column names, same dtype, same column ordering)
and must match the existing data if mode is StagedDataFinalizeMethod.APPEND
. For more information about schema options see
documentation for arcticdb.LibraryOptions.dynamic_schema
If the symbol does not exist both StagedDataFinalizeMethod.APPEND
and StagedDataFinalizeMethod.WRITE
will create it.
Calling sort_and_finalize_staged_data
without having staged data for the symbol will throw UserInputException
. Use
get_staged_symbols
to check if there are staged segments for the symbol.
Calling sort_and_finalize_staged_data
if any of the staged segments contains NaT in its index will throw SortingException
.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol to finalize data for.
TYPE:
|
mode
|
Finalize mode. Valid options are WRITE or APPEND. Write collects the staged data and writes them to a new timeseries. Append collects the staged data and appends them to the latest version.
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
metadata
|
Optional metadata to persist along with the symbol.
TYPE:
|
delete_staged_data_on_failure
|
Determines the handling of staged data when an exception occurs during the execution of the
To manually delete staged data, use the
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
Structure containing metadata and version number of the written symbol in the store. The data member will be None. |
RAISES | DESCRIPTION |
---|---|
SortingException
|
|
UserInputException
|
|
SchemaException
|
|
See Also
write
Documentation on the staged
parameter explains the concept of staged data in more detail.
Examples:
>>> lib.write("sym", pd.DataFrame({"col": [2, 4]}, index=pd.DatetimeIndex([pd.Timestamp(2024, 1, 2), pd.Timestamp(2024, 1, 4)])), staged=True)
>>> lib.write("sym", pd.DataFrame({"col": [3, 1]}, index=pd.DatetimeIndex([pd.Timestamp(2024, 1, 3), pd.Timestamp(2024, 1, 1)])), staged=True)
>>> lib.sort_and_finalize_staged_data("sym")
>>> lib.read("sym").data
col
2024-01-01 1
2024-01-02 2
2024-01-03 3
2024-01-04 4
stage ¶
stage(
symbol: str,
data: NormalizableType,
validate_index=True,
sort_on_index=False,
sort_columns: List[str] = None,
)
Write a staged data chunk to storage, that will not be visible until finalize_staged_data is called on the symbol. Equivalent to write() with staged=True.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name. Limited to 255 characters. The following characters are not supported in symbols:
TYPE:
|
data
|
Data to be written. Staged data must be normalizable.
TYPE:
|
validate_index
|
Check that the index is sorted prior to writing. In the case of unsorted data, throw an UnsortedDataException
DEFAULT:
|
sort_on_index
|
If an appropriate index is present, sort the data on it. In combination with sort_columns the index will be used as the primary sort column, and the others as secondaries.
DEFAULT:
|
sort_columns
|
Sort the data by specific columns prior to writing.
TYPE:
|
tail ¶
tail(
symbol: str,
n: int = 5,
as_of: Optional[Union[int, str]] = None,
columns: List[str] = None,
lazy: bool = False,
) -> Union[VersionedItem, LazyDataFrame]
Read the last n rows of data for the named symbol. If n is negative, return all rows except the first n rows.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
n
|
Number of rows to select if non-negative, otherwise number of rows to exclude.
TYPE:
|
as_of
|
See documentation on
TYPE:
|
columns
|
See documentation on
TYPE:
|
lazy
|
See documentation on
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[VersionedItem, LazyDataFrame]
|
If lazy is False, VersionedItem object that contains a .data and .metadata element. If lazy is True, a LazyDataFrame object on which further querying can be performed prior to collect. |
update ¶
update(
symbol: str,
data: Union[DataFrame, Series],
metadata: Any = None,
upsert: bool = False,
date_range: Optional[
Tuple[Optional[Timestamp], Optional[Timestamp]]
] = None,
prune_previous_versions: bool = False,
) -> VersionedItem
Overwrites existing symbol data with the contents of data
. The entire range between the first and last index
entry in data
is replaced in its entirety with the contents of data
, adding additional index entries if
required. update
only operates over the outermost index level - this means secondary index rows will be
removed if not contained in data
.
Both the existing symbol version and data
must be timeseries-indexed.
In the case where data
has zero rows, nothing will be done and no new version will be created. This means that
update
cannot be used with date_range
to just delete a subset of the data. We have delete_data_in_range
for exactly this purpose and to make it very clear when deletion is intended.
Note that update
is not designed for multiple concurrent writers over a single symbol.
If using static schema then all the column names of data
, their order, and their type must match the columns already in storage.
If dynamic schema is used then data will override everything in storage for the entire index of data
. Update
will not keep columns from storage which are not in data
.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name.
TYPE:
|
data
|
Timeseries indexed data to use for the update.
TYPE:
|
metadata
|
Metadata to persist along with the new symbol version.
TYPE:
|
upsert
|
If True, will write the data even if the symbol does not exist.
TYPE:
|
date_range
|
If a range is specified, it will delete the stored value within the range and overwrite it with the data in
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
Structure containing metadata and version number of the written symbol in the store. |
Examples:
>>> df = pd.DataFrame(
... {'column': [1,2,3,4]},
... index=pd.date_range(start='1/1/2018', end='1/4/2018')
... )
>>> df
column
2018-01-01 1
2018-01-02 2
2018-01-03 3
2018-01-04 4
>>> lib.write("symbol", df)
>>> update_df = pd.DataFrame(
... {'column': [400, 40]},
... index=pd.date_range(start='1/1/2018', end='1/3/2018', freq='2D')
... )
>>> update_df
column
2018-01-01 400
2018-01-03 40
>>> lib.update("symbol", update_df)
>>> # Note that 2018-01-02 is gone despite not being in update_df
>>> lib.read("symbol").data
column
2018-01-01 400
2018-01-03 40
2018-01-04 4
write ¶
write(
symbol: str,
data: NormalizableType,
metadata: Any = None,
prune_previous_versions: bool = False,
staged=False,
validate_index=True,
) -> VersionedItem
Write data
to the specified symbol
. If symbol
already exists then a new version will be created to
reference the newly written data. For more information on versions see the documentation for the read
primitive.
data
must be of a format that can be normalised into Arctic's internal storage structure. Pandas
DataFrames, Pandas Series and Numpy NDArrays can all be normalised. Normalised data will be split along both the
columns and rows into segments. By default, a segment will contain 100,000 rows and 127 columns.
If this library has write_deduplication
enabled then segments will be deduplicated against storage prior to
write to reduce required IO operations and storage requirements. Data will be effectively deduplicated for all
segments up until the first differing row when compared to storage. As a result, modifying the beginning
of data
with respect to previously written versions may significantly reduce the effectiveness of
deduplication.
Note that write
is not designed for multiple concurrent writers over a single symbol unless the staged
keyword argument is set to True. If staged
is True, written segments will be staged and left in an
"incomplete" stage, unable to be read until they are finalized. This enables multiple
writers to a single symbol - all writing staged data at the same time - with one process able to later finalize
all staged data rendering the data readable by clients. To finalize staged data, see finalize_staged_data
.
Note: ArcticDB will use the 0-th level index of the Pandas DataFrame for its on-disk index.
Any non-DatetimeIndex
will converted into an internal RowCount
index. That is, ArcticDB will assign each
row a monotonically increasing integer identifier and that will be used for the index.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name. Limited to 255 characters. The following characters are not supported in symbols:
TYPE:
|
data
|
Data to be written. To write non-normalizable data, use
TYPE:
|
metadata
|
Optional metadata to persist along with the symbol.
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
staged
|
Whether to write to a staging area rather than immediately to the library.
See documentation on
TYPE:
|
validate_index
|
If True, verify that the index of
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
Structure containing metadata and version number of the written symbol in the store. |
RAISES | DESCRIPTION |
---|---|
ArcticUnsupportedDataTypeException
|
If |
UnsortedDataException
|
If data is unsorted and validate_index is set to True. |
Examples:
>>> df = pd.DataFrame({'column': [5,6,7]})
>>> lib.write("symbol", df, metadata={'my_dictionary': 'is_great'})
>>> lib.read("symbol").data
column
0 5
1 6
2 7
Staging data for later finalisation (enables concurrent writes):
>>> df = pd.DataFrame({'column': [5,6,7]}, index=pd.date_range(start='1/1/2000', periods=3))
>>> lib.write("staged", df, staged=True) # Multiple staged writes can occur in parallel
>>> lib.finalize_staged_data("staged", StagedDataFinalizeMethod.WRITE) # Must be run after all staged writes have completed
>>> lib.read("staged").data # Would return error if run before finalization
column
2000-01-01 5
2000-01-02 6
2000-01-03 7
WritePayload objects can be unpacked and used as parameters:
>>> w = adb.WritePayload("symbol", df, metadata={'the': 'metadata'})
>>> lib.write(*w, staged=True)
write_batch ¶
write_batch(
payloads: List[WritePayload],
prune_previous_versions: bool = False,
validate_index=True,
) -> List[Union[VersionedItem, DataError]]
Write a batch of multiple symbols.
PARAMETER | DESCRIPTION |
---|---|
payloads
|
Symbols and their corresponding data. There must not be any duplicate symbols in
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
validate_index
|
Verify that each entry in the batch has an index that supports date range searches and update operations. This tests that the data is sorted in ascending order, using Pandas DataFrame.index.is_monotonic_increasing.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
List[Union[VersionedItem, DataError]]
|
List of versioned items. The data attribute will be None for each versioned item.
i-th entry corresponds to i-th element of |
RAISES | DESCRIPTION |
---|---|
ArcticDuplicateSymbolsInBatchException
|
When duplicate symbols appear in payload. |
ArcticUnsupportedDataTypeException
|
If data that is not of NormalizableType appears in any of the payloads. |
See Also
write: For more detailed documentation.
Examples:
Writing a simple batch:
>>> df_1 = pd.DataFrame({'column': [1,2,3]})
>>> df_2 = pd.DataFrame({'column': [4,5,6]})
>>> payload_1 = adb.WritePayload("symbol_1", df_1, metadata={'the': 'metadata'})
>>> payload_2 = adb.WritePayload("symbol_2", df_2)
>>> items = lib.write_batch([payload_1, payload_2])
>>> lib.read("symbol_1").data
column
0 1
1 2
2 3
>>> lib.read("symbol_2").data
column
0 4
1 5
2 6
>>> items[0].symbol, items[1].symbol
('symbol_1', 'symbol_2')
write_metadata ¶
write_metadata(
symbol: str,
metadata: Any,
prune_previous_versions: bool = False,
) -> VersionedItem
Write metadata under the specified symbol name to this library. The data will remain unchanged. A new version will be created.
If the symbol is missing, it causes a write with empty data (None, pickled, can't append) and the supplied metadata.
This method should be faster than write
as it involves no data segment read/write operations.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
Symbol name for the item
TYPE:
|
metadata
|
Metadata to persist along with the symbol
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database. Note that metadata is versioned alongside the data it is referring to, and so this operation removes old versions of data as well as metadata.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
Structure containing metadata and version number of the affected symbol in the store. |
write_metadata_batch ¶
write_metadata_batch(
write_metadata_payloads: List[WriteMetadataPayload],
prune_previous_versions: bool = False,
) -> List[Union[VersionedItem, DataError]]
Write metadata to multiple symbols in a batch fashion. This is more efficient than making multiple write_metadata
calls
in succession as some constant-time operations can be executed only once rather than once for each element of
write_metadata_payloads
.
Note that this isn't an atomic operation - it's possible for the metadata for one symbol to be fully written and
readable before another symbol.
PARAMETER | DESCRIPTION |
---|---|
write_metadata_payloads
|
Symbols and their corresponding metadata. There must not be any duplicate symbols in
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database. Note that metadata is versioned alongside the data it is referring to, and so this operation removes old versions of data as well as metadata.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Union[VersionedItem, DataError]]
|
List of versioned items. The data attribute will be None for each versioned item.
i-th entry corresponds to i-th element of |
RAISES | DESCRIPTION |
---|---|
ArcticDuplicateSymbolsInBatchException
|
When duplicate symbols appear in write_metadata_payloads. |
Examples:
Writing a simple batch:
>>> payload_1 = adb.WriteMetadataPayload("symbol_1", {'the': 'metadata_1'})
>>> payload_2 = adb.WriteMetadataPayload("symbol_2", {'the': 'metadata_2'})
>>> items = lib.write_metadata_batch([payload_1, payload_2])
>>> lib.read_metadata("symbol_1")
{'the': 'metadata_1'}
>>> lib.read_metadata("symbol_2")
{'the': 'metadata_2'}
write_pickle ¶
write_pickle(
symbol: str,
data: Any,
metadata: Any = None,
prune_previous_versions: bool = False,
staged=False,
) -> VersionedItem
See write
. This method differs from write
only in that data
can be of any type that is serialisable via
the Pickle library. There are significant downsides to storing data in this way:
- Retrieval can only be done in bulk. Calls to
read
will not supportdate_range
,query_builder
orcolumns
. - The data cannot be updated or appended to via the update and append methods.
- Writes cannot be deduplicated in any way.
PARAMETER | DESCRIPTION |
---|---|
symbol
|
See documentation on
TYPE:
|
data
|
Data to be written.
TYPE:
|
metadata
|
See documentation on
TYPE:
|
prune_previous_versions
|
See documentation on
TYPE:
|
staged
|
See documentation on
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
VersionedItem
|
See documentation on |
Examples:
>>> lib.write_pickle("symbol", [1,2,3])
>>> lib.read("symbol").data
[1, 2, 3]
See Also
write: For more detailed documentation.
write_pickle_batch ¶
write_pickle_batch(
payloads: List[WritePayload],
prune_previous_versions: bool = False,
) -> List[Union[VersionedItem, DataError]]
Write a batch of multiple symbols, pickling their data if necessary.
PARAMETER | DESCRIPTION |
---|---|
payloads
|
Symbols and their corresponding data. There must not be any duplicate symbols in
TYPE:
|
prune_previous_versions
|
Removes previous (non-snapshotted) versions from the database.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Union[VersionedItem, DataError]]
|
Structures containing metadata and version number of the written symbols in the store, in the
same order as |
RAISES | DESCRIPTION |
---|---|
ArcticDuplicateSymbolsInBatchException
|
When duplicate symbols appear in payload. |
See Also
write: For more detailed documentation. write_pickle: For information on the implications of providing data that needs to be pickled.