Skip to content

Library API

This page documents the arcticdb.version_store.library module. This module is the main interface exposing read/write functionality within a given Arctic instance.

The key functionality is exposed through arcticdb.version_store.library.Library instances. See the Arctic API section for notes on how to create these. The other types exposed in this module are less important and are used as part of the signature of arcticdb.version_store.library.Library instance methods.

arcticdb.version_store.library.Library

The main interface exposing read/write functionality within a given Arctic instance.

Arctic libraries contain named symbols which are the atomic unit of data storage within Arctic. Symbols contain data that in most cases resembles a DataFrame and are versioned such that all modifying operations can be tracked and reverted.

Instances of this class provide a number of primitives to write, modify and remove symbols, as well as also providing methods to manage library snapshots. For more information on snapshots please see the snapshot method.

Arctic libraries support concurrent writes and reads to multiple symbols as well as concurrent reads to a single symbol. However, concurrent writers to a single symbol are not supported other than for primitives that explicitly state support for single-symbol concurrent writes.

name property

name

The name of this library.

append

append(
    symbol: str,
    data: NormalizableType,
    metadata: Any = None,
    prune_previous_versions: bool = False,
    validate_index: bool = True,
) -> Optional[VersionedItem]

Appends the given data to the existing, stored data. Append always appends along the index. A new version will be created to reference the newly-appended data. Append only accepts data for which the index of the first row is equal to or greater than the index of the last row in the existing data.

Appends containing differing column sets to the existing data are only possible if the library has been configured to support dynamic schemas.

Note that append is not designed for multiple concurrent writers over a single symbol.

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

data

Data to be written.

TYPE: NormalizableType

metadata

Optional metadata to persist along with the new symbol version. Note that the metadata is not combined in any way with the metadata stored in the previous version.

TYPE: Any DEFAULT: None

prune_previous_versions

Removes previous (non-snapshotted) versions from the database when True.

TYPE: bool DEFAULT: False

validate_index

If True, will verify that resulting symbol will support date range searches and update operations. This in effect tests that the previous version of the data and data are both sorted in ascending order. ArcticDB relies on Pandas to detect if data is sorted - you can call DataFrame.index.is_monotonic_increasing on your input DataFrame to see if Pandas believes the data to be sorted

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
VersionedItem

Structure containing metadata and version number of the written symbol in the store.

RAISES DESCRIPTION
UnsortedDataException

If data is unsorted, when validate_index is set to True.

Examples:

>>> df = pd.DataFrame(
...    {'column': [1,2,3]},
...    index=pd.date_range(start='1/1/2018', end='1/03/2018')
... )
>>> df
            column
2018-01-01       1
2018-01-02       2
2018-01-03       3
>>> lib.write("symbol", df)
>>> to_append_df = pd.DataFrame(
...    {'column': [4,5,6]},
...    index=pd.date_range(start='1/4/2018', end='1/06/2018')
... )
>>> to_append_df
            column
2018-01-04       4
2018-01-05       5
2018-01-06       6
>>> lib.append("symbol", to_append_df)
>>> lib.read("symbol").data
            column
2018-01-01       1
2018-01-02       2
2018-01-03       3
2018-01-04       4
2018-01-05       5
2018-01-06       6

append_batch

append_batch(
    append_payloads: List[WritePayload],
    prune_previous_versions: bool = False,
    validate_index=True,
) -> List[Union[VersionedItem, DataError]]

Append data to multiple symbols in a batch fashion. This is more efficient than making multiple append calls in succession as some constant-time operations can be executed only once rather than once for each element of append_payloads. Note that this isn't an atomic operation - it's possible for one symbol to be fully written and readable before another symbol.

PARAMETER DESCRIPTION
append_payloads

Symbols and their corresponding data. There must not be any duplicate symbols in append_payloads.

TYPE: `List[WritePayload]`

prune_previous_versions

Removes previous (non-snapshotted) versions from the database.

TYPE: bool DEFAULT: False

validate_index

If set to True, it will verify for each entry in the batch whether the index of the data supports date range searches and update operations. This in effect tests that the data is sorted in ascending order. ArcticDB relies on Pandas to detect if data is sorted - you can call DataFrame.index.is_monotonic_increasing on your input DataFrame to see if Pandas believes the data to be sorted

DEFAULT: True

RETURNS DESCRIPTION
List[Union[VersionedItem, DataError]]

List of versioned items. i-th entry corresponds to i-th element of append_payloads. Each result correspond to a structure containing metadata and version number of the affected symbol in the store. If a key error or any other internal exception is raised, a DataError object is returned, with symbol, error_code, error_category, and exception_string properties.

RAISES DESCRIPTION
ArcticDuplicateSymbolsInBatchException

When duplicate symbols appear in payload.

ArcticUnsupportedDataTypeException

If data that is not of NormalizableType appears in any of the payloads.

defragment_symbol_data

defragment_symbol_data(
    symbol: str, segment_size: Optional[int] = None
) -> VersionedItem

Compacts fragmented segments by merging row-sliced segments (https://docs.arcticdb.io/technical/on_disk_storage/#data-layer). This method calls is_symbol_fragmented to determine whether to proceed with the defragmentation operation.

CAUTION - Please note that a major restriction of this method at present is that any column slicing present on the data will be removed in the new version created as a result of this method. As a result, if the impacted symbol has more than 127 columns (default value), the performance of selecting individual columns of the symbol (by using the columns parameter) may be negatively impacted in the defragmented version. If your symbol has less than 127 columns this caveat does not apply. For more information, please see columns_per_segment here:

https://docs.arcticdb.io/api/arcticdb/arcticdb.LibraryOptions

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

segment_size

Target for maximum no. of rows per segment, after compaction. If parameter is not provided, library option - "segment_row_size" will be used Note that no. of rows per segment, after compaction, may exceed the target. It is for achieving smallest no. of segment after compaction. Please refer to below example for further explanation.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
VersionedItem

Structure containing metadata and version number of the defragmented symbol in the store.

RAISES DESCRIPTION
1002 ErrorCategory.INTERNAL:E_ASSERTION_FAILURE

If is_symbol_fragmented returns false.

2001 ErrorCategory.NORMALIZATION:E_UNIMPLEMENTED_INPUT_TYPE

If library option - "bucketize_dynamic" is ON

Examples:

>>> lib.write("symbol", pd.DataFrame({"A": [0]}, index=[pd.Timestamp(0)]))
>>> lib.append("symbol", pd.DataFrame({"A": [1, 2]}, index=[pd.Timestamp(1), pd.Timestamp(2)]))
>>> lib.append("symbol", pd.DataFrame({"A": [3]}, index=[pd.Timestamp(3)]))
>>> lib.read_index(sym)
                    start_index                     end_index  version_id stream_id          creation_ts          content_hash  index_type  key_type  start_col  end_col  start_row  end_row
1970-01-01 00:00:00.000000000 1970-01-01 00:00:00.000000001          20    b'sym'  1678974096622685727   6872717287607530038          84         2          1        2          0        1
1970-01-01 00:00:00.000000001 1970-01-01 00:00:00.000000003          21    b'sym'  1678974096931527858  12345256156783683504          84         2          1        2          1        3
1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000004          22    b'sym'  1678974096970045987   7952936283266921920          84         2          1        2          3        4
>>> lib.version_store.defragment_symbol_data("symbol", 2)
>>> lib.read_index(sym)  # Returns two segments rather than three as a result of the defragmentation operation
                    start_index                     end_index  version_id stream_id          creation_ts         content_hash  index_type  key_type  start_col  end_col  start_row  end_row
1970-01-01 00:00:00.000000000 1970-01-01 00:00:00.000000003          23    b'sym'  1678974097067271451  5576804837479525884          84         2          1        2          0        3
1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000004          23    b'sym'  1678974097067427062  7952936283266921920          84         2          1        2          3        4
Notes

Config map setting - SymbolDataCompact.SegmentCount will be replaced by a library setting in the future. This API will allow overriding the setting as well.

delete

delete(
    symbol: str,
    versions: Optional[Union[int, Iterable[int]]] = None,
)

Delete all versions of the symbol from the library, unless version is specified, in which case only those versions are deleted.

This may not actually delete the underlying data if a snapshot still references the version. See snapshot for more detail.

Note that this may require data to be removed from the underlying storage which can be slow.

If no symbol called symbol exists then this is a no-op. In particular this method does not raise in this case.

PARAMETER DESCRIPTION
symbol

Symbol to delete.

TYPE: str

versions

Version or versions of symbol to delete. If None then all versions will be deleted.

TYPE: Optional[Union[int, Iterable[int]]] DEFAULT: None

delete_data_in_range

delete_data_in_range(
    symbol: str,
    date_range: Tuple[
        Optional[Timestamp], Optional[Timestamp]
    ],
)

Delete data within the given date range, creating a new version of symbol.

The existing symbol version must be timeseries-indexed.

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

date_range

The date range in which to delete data. Leaving any part of the tuple as None leaves that part of the range open ended.

TYPE: Tuple[Optional[Timestamp], Optional[Timestamp]]

Examples:

>>> df = pd.DataFrame({"column": [5, 6, 7, 8]}, index=pd.date_range(start="1/1/2018", end="1/4/2018"))
>>> lib.write("symbol", df)
>>> lib.delete_data_in_range("symbol", date_range=(datetime.datetime(2018, 1, 1), datetime.datetime(2018, 1, 2)))
>>> lib["symbol"].version
1
>>> lib["symbol"].data
                column
2018-01-03       7
2018-01-04       8

delete_snapshot

delete_snapshot(snapshot_name: str) -> None

Delete a named snapshot. This may take time if the given snapshot is the last reference to the underlying symbol(s) as the underlying data will be removed as well.

PARAMETER DESCRIPTION
snapshot_name

The snapshot name to delete.

TYPE: str

RAISES DESCRIPTION
Exception

If the named snapshot does not exist.

delete_staged_data

delete_staged_data(symbol: str)

Removes staged data.

PARAMETER DESCRIPTION
symbol

Symbol to remove staged data for.

TYPE: `str`

See Also

write Documentation on the staged parameter explains the concept of staged data in more detail.

finalize_staged_data

finalize_staged_data(
    symbol: str,
    mode: Optional[
        StagedDataFinalizeMethod
    ] = StagedDataFinalizeMethod.WRITE,
    prune_previous_versions: Optional[bool] = False,
)

Finalises staged data, making it available for reads.

PARAMETER DESCRIPTION
symbol

Symbol to finalize data for.

TYPE: `str`

mode

Finalise mode. Valid options are WRITE or APPEND. Write collects the staged data and writes them to a new timeseries. Append collects the staged data and appends them to the latest version.

TYPE: `StagedDataFinalizeMethod` DEFAULT: StagedDataFinalizeMethod.WRITE

prune_previous_versions

Removes previous (non-snapshotted) versions from the database.

TYPE: Optional[bool] DEFAULT: False

See Also

write Documentation on the staged parameter explains the concept of staged data in more detail.

get_description

get_description(
    symbol: str, as_of: Optional[AsOf] = None
) -> SymbolDescription

Returns descriptive data for symbol.

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

as_of

See documentation on read.

TYPE: AsOf DEFAULT: None

RETURNS DESCRIPTION
SymbolDescription

Named tuple containing the descriptive data.

See Also

SymbolDescription For documentation on each field.

get_description_batch

get_description_batch(
    symbols: List[Union[str, ReadInfoRequest]]
) -> List[Union[SymbolDescription, DataError]]

Returns descriptive data for a list of symbols.

PARAMETER DESCRIPTION
symbols

List of symbols to read.

TYPE: List[Union[str, ReadInfoRequest]]

RETURNS DESCRIPTION
List[Union[SymbolDescription, DataError]]

A list of the descriptive data, whose i-th element corresponds to the i-th element of the symbols parameter. If the specified version does not exist, a DataError object is returned, with symbol, version_request_type, version_request_data properties, error_code, error_category, and exception_string properties. If a key error or any other internal exception occurs, the same DataError object is also returned.

See Also

SymbolDescription For documentation on each field.

get_staged_symbols

get_staged_symbols() -> List[str]

Returns all symbols with staged, unfinalized data.

RETURNS DESCRIPTION
List[str]

Symbol names.

See Also

write Documentation on the staged parameter explains the concept of staged data in more detail.

has_symbol

has_symbol(
    symbol: str, as_of: Optional[AsOf] = None
) -> bool

Whether this library contains the given symbol.

PARAMETER DESCRIPTION
symbol

Symbol name for the item

TYPE: str

as_of

Return the data as it was as_of the point in time. See read for more documentation. If absent then considers symbols that are live in the library as of the current time.

TYPE: AsOf DEFAULT: None

RETURNS DESCRIPTION
bool

True if the symbol is in the library, False otherwise.

Examples:

>>> lib.write("symbol", pd.DataFrame())
>>> lib.has_symbol("symbol")
True
>>> lib.has_symbol("another_symbol")
False

The contains operator also checks whether a symbol exists in this library as of now:

>>> "symbol" in lib
True
>>> "another_symbol" in lib
False

head

head(
    symbol: str,
    n: int = 5,
    as_of: Optional[AsOf] = None,
    columns: List[str] = None,
) -> VersionedItem

Read the first n rows of data for the named symbol. If n is negative, return all rows except the last n rows.

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

n

Number of rows to select if non-negative, otherwise number of rows to exclude.

TYPE: int DEFAULT: 5

as_of

See documentation on read.

TYPE: AsOf DEFAULT: None

columns

See documentation on read.

TYPE: List[str] DEFAULT: None

RETURNS DESCRIPTION
VersionedItem object that contains a .data and .metadata element.

is_symbol_fragmented

is_symbol_fragmented(
    symbol: str, segment_size: Optional[int] = None
) -> bool

Check whether the number of segments that would be reduced by compaction is more than or equal to the value specified by the configuration option "SymbolDataCompact.SegmentCount" (defaults to 100).

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

segment_size

Target for maximum no. of rows per segment, after compaction. If parameter is not provided, library option for segments's maximum row size will be used

TYPE: Optional[int] DEFAULT: None

Notes

Config map setting - SymbolDataCompact.SegmentCount will be replaced by a library setting in the future. This API will allow overriding the setting as well.

RETURNS DESCRIPTION
bool

list_snapshots

list_snapshots() -> Dict[str, Any]

List the snapshots in the library.

RETURNS DESCRIPTION
Dict[str, Any]

Snapshots in the library. Keys are snapshot names, values are metadata associated with that snapshot.

list_symbols

list_symbols(
    snapshot_name: Optional[str] = None,
) -> List[str]

Return the symbols in this library.

PARAMETER DESCRIPTION
snapshot_name

Return the symbols available under the snapshot. If None then considers symbols that are live in the library as of the current time.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
List[str]

Symbols in the library.

list_versions

list_versions(
    symbol: Optional[str] = None,
    snapshot: Optional[str] = None,
    latest_only: bool = False,
    skip_snapshots: bool = False,
) -> Dict[SymbolVersion, VersionInfo]

Get the versions in this library, filtered by the passed in parameters.

PARAMETER DESCRIPTION
symbol

Symbol to return versions for. If None returns versions across all symbols in the library.

TYPE: Optional[str] DEFAULT: None

snapshot

Only return the versions contained in the named snapshot.

TYPE: Optional[str] DEFAULT: None

latest_only

Only include the latest version for each returned symbol.

TYPE: bool DEFAULT: False

skip_snapshots

Don't populate version list with snapshot information. Can improve performance significantly if there are many snapshots.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Dict[SymbolVersion, VersionInfo]

Dictionary describing the version for each symbol-version pair in the library. Since symbol version is a (named) tuple you can index in to the dictionary simply as shown in the examples below.

Examples:

>>> df = pd.DataFrame()
>>> lib.write("symbol", df, metadata=10)
>>> lib.write("symbol", df, metadata=11, prune_previous_versions=False)
>>> lib.snapshot("snapshot")
>>> lib.write("symbol", df, metadata=12, prune_previous_versions=False)
>>> lib.delete("symbol", versions=(1, 2))
>>> versions = lib.list_versions("symbol")
>>> versions["symbol", 1].deleted
True
>>> versions["symbol", 1].snapshots
["my_snap"]

prune_previous_versions

prune_previous_versions(symbol)

Removes all (non-snapshotted) versions from the database for the given symbol, except the latest.

PARAMETER DESCRIPTION
symbol

Symbol name to prune.

TYPE: `str`

read

read(
    symbol: str,
    as_of: Optional[AsOf] = None,
    date_range: Optional[
        Tuple[Optional[Timestamp], Optional[Timestamp]]
    ] = None,
    row_range: Optional[Tuple[int, int]] = None,
    columns: Optional[List[str]] = None,
    query_builder: Optional[QueryBuilder] = None,
) -> VersionedItem

Read data for the named symbol. Returns a VersionedItem object with a data and metadata element (as passed into write).

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

as_of

Return the data as it was as of the point in time. None means that the latest version should be read. The various types of this parameter mean: - int: specific version number. Negative indexing is supported, with -1 representing the latest version, -2 the version before that, etc. - str: snapshot name which contains the version - datetime.datetime : the version of the data that existed as_of the requested point in time

TYPE: AsOf DEFAULT: None

date_range

DateRange to restrict read data to.

Applicable only for time-indexed Pandas dataframes or series. Returns only the part of the data that falls withing the given range (inclusive). None on either end leaves that part of the range open-ended. Hence specifying (None, datetime(2025, 1, 1) declares that you wish to read all data up to and including 20250101. The same effect can be achieved by using the date_range clause of the QueryBuilder class, which will be slower, but return data with a smaller memory footprint. See the QueryBuilder.date_range docstring for more details.

Only one of date_range or row_range can be provided.

TYPE: Optional[Tuple[Optional[Timestamp], Optional[Timestamp]]] DEFAULT: None

row_range

Row range to read data for. Inclusive of the lower bound, exclusive of the upper bound lib.read(symbol, row_range=(start, end)).data should behave the same as df.iloc[start:end], including in the handling of negative start/end values.

Only one of date_range or row_range can be provided.

TYPE: Optional[Tuple[int, int]] DEFAULT: None

columns

Applicable only for Pandas data. Determines which columns to return data for.

TYPE: Optional[List[str]] DEFAULT: None

query_builder

A QueryBuilder object to apply to the dataframe before it is returned. For more information see the documentation for the QueryBuilder class (from arcticdb import QueryBuilder; help(QueryBuilder)).

TYPE: Optional[QueryBuilder] DEFAULT: None

RETURNS DESCRIPTION
VersionedItem object that contains a .data and .metadata element

Examples:

>>> df = pd.DataFrame({'column': [5,6,7]})
>>> lib.write("symbol", df, metadata={'my_dictionary': 'is_great'})
>>> lib.read("symbol").data
   column
0       5
1       6
2       7

The default read behaviour is also available through subscripting:

>>> lib["symbol"].data
   column
0       5
1       6
2       7

read_batch

read_batch(
    symbols: List[Union[str, ReadRequest]],
    query_builder: Optional[QueryBuilder] = None,
) -> List[Union[VersionedItem, DataError]]

Reads multiple symbols.

PARAMETER DESCRIPTION
symbols

List of symbols to read.

TYPE: List[Union[str, ReadRequest]]

query_builder

A single QueryBuilder to apply to all the dataframes before they are returned. If this argument is passed then none of the symbols may have their own query_builder specified in their request.

TYPE: Optional[QueryBuilder] DEFAULT: None

RETURNS DESCRIPTION
List[Union[VersionedItem, DataError]]

A list of the read results, whose i-th element corresponds to the i-th element of the symbols parameter. If the specified version does not exist, a DataError object is returned, with symbol, version_request_type, version_request_data properties, error_code, error_category, and exception_string properties. If a key error or any other internal exception occurs, the same DataError object is also returned.

RAISES DESCRIPTION
ArcticInvalidApiUsageException

If kwarg query_builder and per-symbol query builders both used.

Examples:

>>> lib.write("s1", pd.DataFrame())
>>> lib.write("s2", pd.DataFrame({"col": [1, 2, 3]}))
>>> lib.write("s2", pd.DataFrame(), prune_previous_versions=False)
>>> lib.write("s3", pd.DataFrame())
>>> batch = lib.read_batch(["s1", ReadRequest("s2", as_of=0), "s3", ReadRequest("s2", as_of=1000)])
>>> batch[0].data.empty
True
>>> batch[1].data.empty
False
>>> batch[2].data.empty
True
>>> batch[3].symbol
"s2"
>>> from arcticdb import DataError
>>> isinstance(batch[3], DataError)
True
>>> batch[3].version_request_type
VersionRequestType.SPECIFIC
>>> batch[3].version_request_data
1000
>>> batch[3].error_code
ErrorCode.E_NO_SUCH_VERSION
>>> batch[3].error_category
ErrorCategory.MISSING_DATA
See Also

read

read_metadata

read_metadata(
    symbol: str, as_of: Optional[AsOf] = None
) -> VersionedItem

Return the metadata saved for a symbol. This method is faster than read as it only loads the metadata, not the data itself.

PARAMETER DESCRIPTION
symbol

Symbol name

TYPE: str

as_of

Return the metadata as it was as of the point in time. See documentation on read for documentation on the different forms this parameter can take.

TYPE: AsOf DEFAULT: None

RETURNS DESCRIPTION
VersionedItem

Structure containing metadata and version number of the affected symbol in the store. The data attribute will be None.

read_metadata_batch

read_metadata_batch(
    symbols: List[Union[str, ReadInfoRequest]]
) -> List[Union[VersionedItem, DataError]]

Reads the metadata of multiple symbols.

PARAMETER DESCRIPTION
symbols

List of symbols to read metadata.

TYPE: List[Union[str, ReadInfoRequest]]

RETURNS DESCRIPTION
List[Union[VersionedItem, DataError]]

A list of the read metadata results, whose i-th element corresponds to the i-th element of the symbols parameter. A VersionedItem object with the metadata field set as None will be returned if the requested version of the symbol exists but there is no metadata If the specified version does not exist, a DataError object is returned, with symbol, version_request_type, version_request_data properties, error_code, error_category, and exception_string properties. If a key error or any other internal exception occurs, the same DataError object is also returned.

See Also

read_metadata

reload_symbol_list

reload_symbol_list()

Forces the symbol list cache to be reloaded.

This can take a long time on large libraries or certain S3 implementations, and once started, it cannot be safely interrupted. If the call is interrupted somehow (exception/process killed), please call this again ASAP.

snapshot

snapshot(
    snapshot_name: str,
    metadata: Any = None,
    skip_symbols: Optional[List[str]] = None,
    versions: Optional[Dict[str, int]] = None,
) -> None

Creates a named snapshot of the data within a library.

By default, the latest version of every symbol that has not been deleted will be contained within the snapshot. You can change this behaviour with either versions (an allow-list) or with skip_symbols (a deny-list). Concurrent writes with prune previous versions set while the snapshot is being taken can potentially lead to corruption of the affected symbols in the snapshot.

The symbols and versions contained within the snapshot will persist regardless of new symbols and versions being written to the library afterwards. If a version or symbol referenced in a snapshot is deleted then the underlying data will be preserved to ensure the snapshot is still accessible. Only once all referencing snapshots have been removed will the underlying data be removed as well.

At most one of skip_symbols and versions may be truthy.

PARAMETER DESCRIPTION
snapshot_name

Name of the snapshot.

TYPE: str

metadata

Optional metadata to persist along with the snapshot.

TYPE: Any DEFAULT: None

skip_symbols

Optional symbols to be excluded from the snapshot.

TYPE: List[str] DEFAULT: None

versions

Optional dictionary of versions of symbols to snapshot. For example versions={"a": 2, "b": 3} will snapshot version 2 of symbol "a" and version 3 of symbol "b".

TYPE: Optional[Dict[str, int]] DEFAULT: None

RAISES DESCRIPTION
InternalException

If a snapshot already exists with snapshot_name. You must explicitly delete the pre-existing snapshot.

sort_and_finalize_staged_data

sort_and_finalize_staged_data(
    symbol: str,
    mode: Optional[
        StagedDataFinalizeMethod
    ] = StagedDataFinalizeMethod.WRITE,
)

sort_merge will sort and finalize staged data. This differs from finalize_staged_data in that it can support staged segments with interleaved time periods - the end result will be ordered. This requires performing a full sort in memory so can be time consuming.

PARAMETER DESCRIPTION
symbol

Symbol to finalize data for.

TYPE: `str`

mode

Finalise mode. Valid options are WRITE or APPEND. Write collects the staged data and writes them to a new timeseries. Append collects the staged data and appends them to the latest version.

TYPE: `StagedDataFinalizeMethod` DEFAULT: StagedDataFinalizeMethod.WRITE

See Also

write Documentation on the staged parameter explains the concept of staged data in more detail.

tail

tail(
    symbol: str,
    n: int = 5,
    as_of: Optional[Union[int, str]] = None,
    columns: List[str] = None,
) -> VersionedItem

Read the last n rows of data for the named symbol. If n is negative, return all rows except the first n rows.

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

n

Number of rows to select if non-negative, otherwise number of rows to exclude.

TYPE: int DEFAULT: 5

as_of

See documentation on read.

TYPE: AsOf DEFAULT: None

columns

See documentation on read.

TYPE: List[str] DEFAULT: None

RETURNS DESCRIPTION
VersionedItem object that contains a .data and .metadata element.

update

update(
    symbol: str,
    data: Union[DataFrame, Series],
    metadata: Any = None,
    upsert: bool = False,
    date_range: Optional[
        Tuple[Optional[Timestamp], Optional[Timestamp]]
    ] = None,
    prune_previous_versions=False,
) -> VersionedItem

Overwrites existing symbol data with the contents of data. The entire range between the first and last index entry in data is replaced in its entirety with the contents of data, adding additional index entries if required. update only operates over the outermost index level - this means secondary index rows will be removed if not contained in data.

Both the existing symbol version and data must be timeseries-indexed.

Note that update is not designed for multiple concurrent writers over a single symbol.

PARAMETER DESCRIPTION
symbol

Symbol name.

TYPE: str

data

Timeseries indexed data to use for the update.

TYPE: Union[DataFrame, Series]

metadata

Metadata to persist along with the new symbol version.

TYPE: Any DEFAULT: None

upsert

If True, will write the data even if the symbol does not exist.

TYPE: bool DEFAULT: False

date_range

If a range is specified, it will delete the stored value within the range and overwrite it with the data in data. This allows the user to update with data that might only be a subset of the stored value. Leaving any part of the tuple as None leaves that part of the range open ended. Only data with date_range will be modified, even if data covers a wider date range.

TYPE: Optional[Tuple[Optional[Timestamp], Optional[Timestamp]]] DEFAULT: None

prune_previous_versions

Removes previous (non-snapshotted) versions from the database when True.

DEFAULT: False

Examples:

>>> df = pd.DataFrame(
...    {'column': [1,2,3,4]},
...    index=pd.date_range(start='1/1/2018', end='1/4/2018')
... )
>>> df
            column
2018-01-01       1
2018-01-02       2
2018-01-03       3
2018-01-04       4
>>> lib.write("symbol", df)
>>> update_df = pd.DataFrame(
...    {'column': [400, 40]},
...    index=pd.date_range(start='1/1/2018', end='1/3/2018', freq='2D')
... )
>>> update_df
            column
2018-01-01     400
2018-01-03      40
>>> lib.update("symbol", update_df)
>>> # Note that 2018-01-02 is gone despite not being in update_df
>>> lib.read("symbol").data
            column
2018-01-01     400
2018-01-03      40
2018-01-04       4

write

write(
    symbol: str,
    data: NormalizableType,
    metadata: Any = None,
    prune_previous_versions: bool = False,
    staged=False,
    validate_index=True,
) -> VersionedItem

Write data to the specified symbol. If symbol already exists then a new version will be created to reference the newly written data. For more information on versions see the documentation for the read primitive.

data must be of a format that can be normalised into Arctic's internal storage structure. Pandas DataFrames, Pandas Series and Numpy NDArrays can all be normalised. Normalised data will be split along both the columns and rows into segments. By default, a segment will contain 100,000 rows and 127 columns.

If this library has write_deduplication enabled then segments will be deduplicated against storage prior to write to reduce required IO operations and storage requirements. Data will be effectively deduplicated for all segments up until the first differing row when compared to storage. As a result, modifying the beginning of data with respect to previously written versions may significantly reduce the effectiveness of deduplication.

Note that write is not designed for multiple concurrent writers over a single symbol unless the staged keyword argument is set to True. If staged is True, written segments will be staged and left in an "incomplete" stage, unable to be read until they are finalized. This enables multiple writers to a single symbol - all writing staged data at the same time - with one process able to later finalise all staged data rendering the data readable by clients. To finalise staged data, see finalize_staged_data.

Note: ArcticDB will use the 0-th level index of the Pandas DataFrame for its on-disk index.

Any non-DatetimeIndex will converted into an internal RowCount index. That is, ArcticDB will assign each row a monotonically increasing integer identifier and that will be used for the index.

PARAMETER DESCRIPTION
symbol

Symbol name. Limited to 255 characters. The following characters are not supported in symbols: "*", "&", "<", ">"

TYPE: str

data

Data to be written. To write non-normalizable data, use write_pickle.

TYPE: NormalizableType

metadata

Optional metadata to persist along with the symbol.

TYPE: Any DEFAULT: None

prune_previous_versions

Removes previous (non-snapshotted) versions from the database.

TYPE: bool DEFAULT: False

staged

Whether to write to a staging area rather than immediately to the library.

TYPE: bool DEFAULT: False

validate_index

If True, will verify that the index of data supports date range searches and update operations. This in effect tests that the data is sorted in ascending order. ArcticDB relies on Pandas to detect if data is sorted - you can call DataFrame.index.is_monotonic_increasing on your input DataFrame to see if Pandas believes the data to be sorted

Note that each unit of staged data must a) be datetime indexed and b) not overlap with any other unit of staged data. Note that this will create symbols with Dynamic Schema enabled.

DEFAULT: True

RETURNS DESCRIPTION
VersionedItem

Structure containing metadata and version number of the written symbol in the store.

RAISES DESCRIPTION
ArcticUnsupportedDataTypeException

If data is not of NormalizableType.

UnsortedDataException

If data is unsorted, when validate_index is set to True.

Examples:

>>> df = pd.DataFrame({'column': [5,6,7]})
>>> lib.write("symbol", df, metadata={'my_dictionary': 'is_great'})
>>> lib.read("symbol").data
   column
0       5
1       6
2       7

Staging data for later finalisation (enables concurrent writes):

>>> df = pd.DataFrame({'column': [5,6,7]}, index=pd.date_range(start='1/1/2000', periods=3))
>>> lib.write("staged", df, staged=True)  # Multiple staged writes can occur in parallel
>>> lib.finalize_staged_data("staged", StagedDataFinalizeMethod.WRITE)  # Must be run after all staged writes have completed
>>> lib.read("staged").data  # Would return error if run before finalization
            column
2000-01-01       5
2000-01-02       6
2000-01-03       7

WritePayload objects can be unpacked and used as parameters:

>>> w = WritePayload("symbol", df, metadata={'the': 'metadata'})
>>> lib.write(*w, staged=True)

write_batch

write_batch(
    payloads: List[WritePayload],
    prune_previous_versions: bool = False,
    validate_index=True,
) -> List[Union[VersionedItem, DataError]]

Write a batch of multiple symbols.

PARAMETER DESCRIPTION
payloads

Symbols and their corresponding data. There must not be any duplicate symbols in payload.

TYPE: `List[WritePayload]`

prune_previous_versions

See write.

TYPE: bool DEFAULT: False

validate_index

If set to True, it will verify for each entry in the batch whether the index of the data supports date range searches and update operations. This in effect tests that the data is sorted in ascending order. ArcticDB relies on Pandas to detect if data is sorted - you can call DataFrame.index.is_monotonic_increasing on your input DataFrame to see if Pandas believes the data to be sorted

DEFAULT: True

RETURNS DESCRIPTION
List[Union[VersionedItem, DataError]]

List of versioned items. The data attribute will be None for each versioned item. i-th entry corresponds to i-th element of payloads. Each result correspond to a structure containing metadata and version number of the written symbols in the store, in the same order as payload. If a key error or any other internal exception is raised, a DataError object is returned, with symbol, error_code, error_category, and exception_string properties.

RAISES DESCRIPTION
ArcticDuplicateSymbolsInBatchException

When duplicate symbols appear in payload.

ArcticUnsupportedDataTypeException

If data that is not of NormalizableType appears in any of the payloads.

See Also

write: For more detailed documentation.

Examples:

Writing a simple batch:

>>> df_1 = pd.DataFrame({'column': [1,2,3]})
>>> df_2 = pd.DataFrame({'column': [4,5,6]})
>>> payload_1 = WritePayload("symbol_1", df_1, metadata={'the': 'metadata'})
>>> payload_2 = WritePayload("symbol_2", df_2)
>>> items = lib.write_batch([payload_1, payload_2])
>>> lib.read("symbol_1").data
   column
0       1
1       2
2       3
>>> lib.read("symbol_2").data
   column
0       4
1       5
2       6
>>> items[0].symbol, items[1].symbol
('symbol_1', 'symbol_2')

write_metadata

write_metadata(symbol: str, metadata: Any) -> VersionedItem

Write metadata under the specified symbol name to this library. The data will remain unchanged. A new version will be created.

If the symbol is missing, it causes a write with empty data (None, pickled, can't append) and the supplied metadata.

This method should be faster than write as it involves no data segment read/write operations.

PARAMETER DESCRIPTION
symbol

Symbol name for the item

TYPE: str

metadata

Metadata to persist along with the symbol

TYPE: Any

RETURNS DESCRIPTION
VersionedItem

Structure containing metadata and version number of the affected symbol in the store.

write_metadata_batch

write_metadata_batch(
    write_metadata_payloads: List[WriteMetadataPayload],
    prune_previous_versions=None,
) -> List[Union[VersionedItem, DataError]]

Write metadata to multiple symbols in a batch fashion. This is more efficient than making multiple write_metadata calls in succession as some constant-time operations can be executed only once rather than once for each element of write_metadata_payloads. Note that this isn't an atomic operation - it's possible for the metadata for one symbol to be fully written and readable before another symbol.

PARAMETER DESCRIPTION
write_metadata_payloads

Symbols and their corresponding metadata. There must not be any duplicate symbols in payload.

TYPE: `List[WriteMetadataPayload]`

prune_previous_versions

Remove previous versions from version list. Uses library default if left as None.

TYPE: `Optional[bool]` DEFAULT: None

RETURNS DESCRIPTION
List[Union[VersionedItem, DataError]]

List of versioned items. The data attribute will be None for each versioned item. i-th entry corresponds to i-th element of write_metadata_payloads. Each result correspond to a structure containing metadata and version number of the affected symbol in the store. If any internal exception is raised, a DataError object is returned, with symbol, error_code, error_category, and exception_string properties.

RAISES DESCRIPTION
ArcticDuplicateSymbolsInBatchException

When duplicate symbols appear in write_metadata_payloads.

Examples:

Writing a simple batch:

>>> payload_1 = WriteMetadataPayload("symbol_1", {'the': 'metadata_1'})
>>> payload_2 = WriteMetadataPayload("symbol_2", {'the': 'metadata_2'})
>>> items = lib.write_metadata_batch([payload_1, payload_2])
>>> lib.read_metadata("symbol_1")
{'the': 'metadata_1'}
>>> lib.read_metadata("symbol_2")
{'the': 'metadata_2'}

write_pickle

write_pickle(
    symbol: str,
    data: Any,
    metadata: Any = None,
    prune_previous_versions: bool = False,
    staged=False,
) -> VersionedItem

See write. This method differs from write only in that data can be of any type that is serialisable via the Pickle library. There are significant downsides to storing data in this way:

  • Retrieval can only be done in bulk. Calls to read will not support date_range, query_builder or columns.
  • The data cannot be updated or appended to via the update and append methods.
  • Writes cannot be deduplicated in any way.
PARAMETER DESCRIPTION
symbol

See documentation on write.

TYPE: str

data

Data to be written.

TYPE: `Any`

metadata

See documentation on write.

TYPE: Any DEFAULT: None

prune_previous_versions

See documentation on write.

TYPE: bool DEFAULT: False

staged

See documentation on write.

DEFAULT: False

RETURNS DESCRIPTION
VersionedItem

See documentation on write.

Examples:

>>> lib.write_pickle("symbol", [1,2,3])
>>> lib.read("symbol").data
[1, 2, 3]
See Also

write: For more detailed documentation.

write_pickle_batch

write_pickle_batch(
    payloads: List[WritePayload],
    prune_previous_versions: bool = False,
    staged=False,
) -> List[VersionedItem]

Write a batch of multiple symbols, pickling their data if necessary.

PARAMETER DESCRIPTION
payloads

Symbols and their corresponding data. There must not be any duplicate symbols in payload.

TYPE: `List[WritePayload]`

prune_previous_versions

See write.

TYPE: bool DEFAULT: False

staged

See write.

DEFAULT: False

RETURNS DESCRIPTION
List[VersionedItem]

Structures containing metadata and version number of the written symbols in the store, in the same order as payload.

RAISES DESCRIPTION
ArcticDuplicateSymbolsInBatchException

When duplicate symbols appear in payload.

See Also

write: For more detailed documentation. write_pickle: For information on the implications of providing data that needs to be pickled.