Library API
This page documents the arcticdb.version_store.library
module. This module is the main interface
exposing read/write functionality within a given Arctic instance.
The key functionality is exposed through arcticdb.version_store.library.Library
instances. See the
Arctic API section for notes on how to create these. The other types exposed in this module are less
important and are used as part of the signature of arcticdb.version_store.library.Library
instance
methods.
The main interface exposing read/write functionality within a given Arctic instance. |
|
Types that can be normalised into Arctic's internal storage structure. |
|
|
Exception indicating an invalid call made to the Arctic API. |
|
Exception indicating that duplicate symbols were passed to a batch method of this module. |
|
Exception indicating that a method does not support the type of data provided. |
A named tuple. |
|
A named tuple. |
|
A named tuple. |
|
WritePayload is designed to enable batching of multiple operations with an API that mirrors the singular |
|
ReadRequest is designed to enable batching of read operations with an API that mirrors the singular |
|
ReadInfoRequest is useful for batch methods like read_metadata_batch and get_description_batch, where we only need to specify the symbol and the version information. |
- class arcticdb.version_store.library.Library(arctic_instance_description: str, nvs: NativeVersionStore)[source]
The main interface exposing read/write functionality within a given Arctic instance.
Arctic libraries contain named symbols which are the atomic unit of data storage within Arctic. Symbols contain data that in most cases resembles a DataFrame and are versioned such that all modifying operations can be tracked and reverted.
Instances of this class provide a number of primitives to write, modify and remove symbols, as well as also providing methods to manage library snapshots. For more information on snapshots please see the snapshot method.
Arctic libraries support concurrent writes and reads to multiple symbols as well as concurrent reads to a single symbol. However, concurrent writers to a single symbol are not supported other than for primitives that explicitly state support for single-symbol concurrent writes.
- __init__(arctic_instance_description: str, nvs: NativeVersionStore)[source]
- Parameters:
arctic_instance_description – Human readable description of the Arctic instance to which this library belongs. Used for informational purposes only.
nvs – The native version store that backs this library.
- write(symbol: str, data: DataFrame | Series | ndarray, metadata: Any | None = None, prune_previous_versions: bool = True, staged=False, validate_index=True) VersionedItem [source]
Write
data
to the specifiedsymbol
. Ifsymbol
already exists then a new version will be created to reference the newly written data. For more information on versions see the documentation for the read primitive.data
must be of a format that can be normalised into Arctic’s internal storage structure. Pandas DataFrames, Pandas Series and Numpy NDArrays can all be normalised. Normalised data will be split along both the columns and rows into segments. By default, a segment will contain 100,000 rows and 127 columns.If this library has
write_deduplication
enabled then segments will be deduplicated against storage prior to write to reduce required IO operations and storage requirements. Data will be effectively deduplicated for all segments up until the first differing row when compared to storage. As a result, modifying the beginning ofdata
with respect to previously written versions may significantly reduce the effectiveness of deduplication.Note that write is not designed for multiple concurrent writers over a single symbol unless the staged keyword argument is set to True. If
staged
is True, written segments will be staged and left in an “incomplete” stage, unable to be read until they are finalized. This enables multiple writers to a single symbol - all writing staged data at the same time - with one process able to later finalise all staged data rendering the data readable by clients. To finalise staged data, see finalize_staged_data.Note: ArcticDB will use the 0-th level index of the Pandas DataFrame for its on-disk index.
Any non-DatetimeIndex will converted into an internal RowCount index. That is, ArcticDB will assign each row a monotonically increasing integer identifier and that will be used for the index.
- Parameters:
symbol (str) – Symbol name. Limited to 255 characters. The following characters are not supported in symbols:
"*", "&", "<", ">"
data (NormalizableType) – Data to be written. To write non-normalizable data, use write_pickle.
metadata (Any, default=None) – Optional metadata to persist along with the symbol.
prune_previous_versions (bool, default=True) – Removes previous (non-snapshotted) versions from the database.
staged (bool, default=False) – Whether to write to a staging area rather than immediately to the library.
validate_index (bool, default=False) –
If True, will verify that the index of data supports date range searches and update operations. This in effect tests that the data is sorted in ascending order. ArcticDB relies on Pandas to detect if data is sorted - you can call DataFrame.index.is_monotonic_increasing on your input DataFrame to see if Pandas believes the data to be sorted
Note that each unit of staged data must a) be datetime indexed and b) not overlap with any other unit of staged data. Note that this will create symbols with Dynamic Schema enabled.
- Returns:
Structure containing metadata and version number of the written symbol in the store.
- Return type:
VersionedItem
- Raises:
ArcticUnsupportedDataTypeException – If
data
is not of NormalizableType.UnsortedDataException – If data is unsorted, when validate_index is set to True.
Examples
>>> df = pd.DataFrame({'column': [5,6,7]}) >>> lib.write("symbol", df, metadata={'my_dictionary': 'is_great'}) >>> lib.read("symbol").data column 0 5 1 6 2 7
Staging data for later finalisation (enables concurrent writes):
>>> df = pd.DataFrame({'column': [5,6,7]}, index=pd.date_range(start='1/1/2000', periods=3)) >>> lib.write("staged", df, staged=True) # Multiple staged writes can occur in parallel >>> lib.finalize_staged_data("staged", StagedDataFinalizeMethod.WRITE) # Must be run after all staged writes have completed >>> lib.read("staged").data # Would return error if run before finalization column 2000-01-01 5 2000-01-02 6 2000-01-03 7
WritePayload objects can be unpacked and used as parameters:
>>> w = WritePayload("symbol", df, metadata={'the': 'metadata'}) >>> lib.write(*w, staged=True)
- write_pickle(symbol: str, data: Any, metadata: Any | None = None, prune_previous_versions: bool = True, staged=False) VersionedItem [source]
See write. This method differs from write only in that
data
can be of any type that is serialisable via the Pickle library. There are significant downsides to storing data in this way:Retrieval can only be done in bulk. Calls to read will not support date_range, query_builder or columns.
The data cannot be updated or appended to via the update and append methods.
Writes cannot be deduplicated in any way.
- Parameters:
symbol – See documentation on write.
data (Any) – Data to be written.
metadata – See documentation on write.
prune_previous_versions – See documentation on write.
staged – See documentation on write.
- Returns:
See documentation on write.
- Return type:
VersionedItem
Examples
>>> lib.write_pickle("symbol", [1,2,3]) >>> lib.read("symbol").data [1, 2, 3]
See also
write
For more detailed documentation.
- write_batch(payloads: List[WritePayload], prune_previous_versions: bool = True, staged=False, validate_index=True) List[VersionedItem] [source]
Write a batch of multiple symbols.
- Parameters:
payloads (List[WritePayload]) – Symbols and their corresponding data. There must not be any duplicate symbols in payload.
prune_previous_versions (bool, default=True) – See write.
staged (bool, default=False) – See write.
validate_index (bool, default=False) – If True, will verify for each entry in the batch hat the index of data supports date range searches and update operations. This in effect tests that the data is sorted in ascending order. ArcticDB relies on Pandas to detect if data is sorted - you can call DataFrame.index.is_monotonic_increasing on your input DataFrame to see if Pandas believes the data to be sorted
- Returns:
Structure containing metadata and version number of the written symbols in the store, in the same order as payload.
- Return type:
List[VersionedItem]
- Raises:
ArcticDuplicateSymbolsInBatchException – When duplicate symbols appear in payload.
ArcticUnsupportedDataTypeException – If data that is not of NormalizableType appears in any of the payloads.
UnsortedDataException – If data is unsorted, when validate_index is set to True.
See also
write
For more detailed documentation.
Examples
Writing a simple batch:
>>> df_1 = pd.DataFrame({'column': [1,2,3]}) >>> df_2 = pd.DataFrame({'column': [4,5,6]}) >>> payload_1 = WritePayload("symbol_1", df_1, metadata={'the': 'metadata'}) >>> payload_2 = WritePayload("symbol_2", df_2) >>> items = lib.write_batch([payload_1, payload_2]) >>> lib.read("symbol_1").data column 0 1 1 2 2 3 >>> lib.read("symbol_2").data column 0 4 1 5 2 6 >>> items[0].symbol, items[1].symbol ('symbol_1', 'symbol_2')
- write_batch_pickle(payloads: List[WritePayload], prune_previous_versions: bool = True, staged=False) List[VersionedItem] [source]
Write a batch of multiple symbols, pickling their data if necessary.
- Parameters:
payloads (List[WritePayload]) – Symbols and their corresponding data. There must not be any duplicate symbols in payload.
prune_previous_versions (bool, default=True) – See write.
staged (bool, default=False) – See write.
- Returns:
Structures containing metadata and version number of the written symbols in the store, in the same order as payload.
- Return type:
List[VersionedItem]
- Raises:
ArcticDuplicateSymbolsInBatchException – When duplicate symbols appear in payload.
See also
write
For more detailed documentation.
write_pickle
For information on the implications of providing data that needs to be pickled.
- append(symbol: str, data: DataFrame | Series | ndarray, metadata: Any | None = None, prune_previous_versions: bool = False, validate_index: bool = True) VersionedItem | None [source]
Appends the given data to the existing, stored data. Append always appends along the index. A new version will be created to reference the newly-appended data. Append only accepts data for which the index of the first row is equal to or greater than the index of the last row in the existing data.
Appends containing differing column sets to the existing data are only possible if the library has been configured to support dynamic schemas.
Note that append is not designed for multiple concurrent writers over a single symbol.
- Parameters:
symbol – Symbol name.
data – Data to be written.
metadata – Optional metadata to persist along with the new symbol version. Note that the metadata is not combined in any way with the metadata stored in the previous version.
prune_previous_versions – Removes previous (non-snapshotted) versions from the database when True.
validate_index (bool, default=False) – If True, will verify that resulting symbol will support date range searches and update operations. This in effect tests that the previous version of the data and data are both sorted in ascending order. ArcticDB relies on Pandas to detect if data is sorted - you can call DataFrame.index.is_monotonic_increasing on your input DataFrame to see if Pandas believes the data to be sorted
- Returns:
Structure containing metadata and version number of the written symbol in the store.
- Return type:
VersionedItem
- Raises:
UnsortedDataException – If data is unsorted, when validate_index is set to True.
Examples
>>> df = pd.DataFrame( ... {'column': [1,2,3]}, ... index=pd.date_range(start='1/1/2018', end='1/03/2018') ... ) >>> df column 2018-01-01 1 2018-01-02 2 2018-01-03 3 >>> lib.write("symbol", df) >>> to_append_df = pd.DataFrame( ... {'column': [4,5,6]}, ... index=pd.date_range(start='1/4/2018', end='1/06/2018') ... ) >>> to_append_df column 2018-01-04 4 2018-01-05 5 2018-01-06 6 >>> lib.append("symbol", to_append_df) >>> lib.read("symbol").data column 2018-01-01 1 2018-01-02 2 2018-01-03 3 2018-01-04 4 2018-01-05 5 2018-01-06 6
- update(symbol: str, data: DataFrame | Series, metadata: Any | None = None, upsert: bool = False, date_range: Tuple[Timestamp | datetime | date | None, Timestamp | datetime | date | None] | None = None, prune_previous_versions=False) VersionedItem [source]
Overwrites existing symbol data with the contents of
data
. The entire range between the first and last index entry indata
is replaced in its entirety with the contents ofdata
, adding additional index entries if required. update only operates over the outermost index level - this means secondary index rows will be removed if not contained indata
.Both the existing symbol version and
data
must be timeseries-indexed.Note that update is not designed for multiple concurrent writers over a single symbol.
- Parameters:
symbol – Symbol name.
data – Timeseries indexed data to use for the update.
metadata – Metadata to persist along with the new symbol version.
upsert (bool, default=False) – If True, will write the data even if the symbol does not exist.
date_range (Tuple[Optional[Timestamp], Optional[Timestamp]], default=None) – If a range is specified, it will delete the stored value within the range and overwrite it with the data in
data
. This allows the user to update with data that might only be a subset of the stored value. Leaving any part of the tuple as None leaves that part of the range open ended. Only data with date_range will be modified, even ifdata
covers a wider date range.prune_previous_versions – Removes previous (non-snapshotted) versions from the database when True.
default=False – Removes previous (non-snapshotted) versions from the database when True.
Examples
>>> df = pd.DataFrame( ... {'column': [1,2,3,4]}, ... index=pd.date_range(start='1/1/2018', end='1/4/2018') ... ) >>> df column 2018-01-01 1 2018-01-02 2 2018-01-03 3 2018-01-04 4 >>> lib.write("symbol", df) >>> update_df = pd.DataFrame( ... {'column': [400, 40]}, ... index=pd.date_range(start='1/1/2018', end='1/3/2018', freq='2D') ... ) >>> update_df column 2018-01-01 400 2018-01-03 40 >>> lib.update("symbol", update_df) >>> # Note that 2018-01-02 is gone despite not being in update_df >>> lib.read("symbol").data column 2018-01-01 400 2018-01-03 40 2018-01-04 4
- finalize_staged_data(symbol: str, mode: StagedDataFinalizeMethod | None = StagedDataFinalizeMethod.WRITE)[source]
Finalises staged data, making it available for reads.
- Parameters:
symbol (str) – Symbol to finalize data for.
mode (StagedDataFinalizeMethod, default=StagedDataFinalizeMethod.WRITE) – Finalise mode. Valid options are WRITE or APPEND. Write collects the staged data and writes them to a new timeseries. Append collects the staged data and appends them to the latest version.
See also
write
Documentation on the
staged
parameter explains the concept of staged data in more detail.
- sort_and_finalize_staged_data(symbol: str, mode: StagedDataFinalizeMethod | None = StagedDataFinalizeMethod.WRITE)[source]
sort_merge will sort and finalize staged data. This differs from finalize_staged_data in that it can support staged segments with interleaved time periods - the end result will be ordered. This requires performing a full sort in memory so can be time consuming.
- Parameters:
symbol (str) – Symbol to finalize data for.
mode (StagedDataFinalizeMethod, default=StagedDataFinalizeMethod.WRITE) – Finalise mode. Valid options are WRITE or APPEND. Write collects the staged data and writes them to a new timeseries. Append collects the staged data and appends them to the latest version.
See also
write
Documentation on the
staged
parameter explains the concept of staged data in more detail.
- get_staged_symbols() List[str] [source]
Returns all symbols with staged, unfinalized data.
- Returns:
Symbol names.
- Return type:
List[str]
See also
write
Documentation on the
staged
parameter explains the concept of staged data in more detail.
- read(symbol: str, as_of: int | str | datetime | None = None, date_range: Tuple[Timestamp | datetime | date | None, Timestamp | datetime | date | None] | None = None, columns: List[str] | None = None, query_builder: QueryBuilder | None = None) VersionedItem [source]
Read data for the named symbol. Returns a VersionedItem object with a data and metadata element (as passed into write).
- Parameters:
symbol (str) – Symbol name.
as_of (AsOf, default=None) –
- Return the data as it was as of the point in time.
None
means that the latest version should be read. The various types of this parameter mean:
int
: specific version numberstr
: snapshot name which contains the versiondatetime.datetime
: the version of the data that existedas_of
the requested point in time
- Return the data as it was as of the point in time.
date_range (Tuple[Optional[Timestamp], Optional[Timestamp]], default=None) –
DateRange to restrict read data to.
Applicable only for time-indexed Pandas dataframes or series. Returns only the part of the data that falls withing the given range (inclusive). None on either end leaves that part of the range open-ended. Hence specifying
(None, datetime(2025, 1, 1)
declares that you wish to read all data up to and including 20250101.columns (List[str], default=None) – Applicable only for Pandas data. Determines which columns to return data for.
query_builder (Optional[QueryBuilder], default=None) – A QueryBuilder object to apply to the dataframe before it is returned. For more information see the documentation for the QueryBuilder class (
from arcticdb import QueryBuilder; help(QueryBuilder)
).
- Return type:
VersionedItem object that contains a .data and .metadata element
Examples
>>> df = pd.DataFrame({'column': [5,6,7]}) >>> lib.write("symbol", df, metadata={'my_dictionary': 'is_great'}) >>> lib.read("symbol").data column 0 5 1 6 2 7
The default read behaviour is also available through subscripting:
>>> lib["symbol"].data column 0 5 1 6 2 7
- read_batch(symbols: List[str | ReadRequest], query_builder: QueryBuilder | None = None) List[VersionedItem] [source]
Reads multiple symbols.
- Parameters:
symbols (List[Union[str, ReadRequest]]) – List of symbols to read.
query_builder (Optional[QueryBuilder], default=None) – A single QueryBuilder to apply to all the dataframes before they are returned. If this argument is passed then none of the
symbols
may have their own query_builder specified in their request.
- Returns:
A list of the read results, whose i-th element corresponds to the i-th element of the
symbols
parameter.- Return type:
List[VersionedItem]
- Raises:
ArcticInvalidApiUsageException – If kwarg query_builder and per-symbol query builders both used.
Examples
>>> lib.write("s1", pd.DataFrame()) >>> lib.write("s2", pd.DataFrame({"col": [1, 2, 3]})) >>> lib.write("s2", pd.DataFrame(), prune_previous_versions=False) >>> lib.write("s3", pd.DataFrame()) >>> batch = lib.read_batch(["s1", ReadRequest("s2", as_of=0), "s3"]) >>> batch[0].data.empty True >>> batch[1].data.empty False
See also
- read_metadata(symbol: str, as_of: int | str | datetime | None = None) VersionedItem [source]
Return the metadata saved for a symbol. This method is faster than read as it only loads the metadata, not the data itself.
- Parameters:
symbol – Symbol name
as_of (AsOf, default=None) – Return the metadata as it was as of the point in time. See documentation on read for documentation on the different forms this parameter can take.
- Returns:
Structure containing metadata and version number of the affected symbol in the store. The data attribute will be None.
- Return type:
VersionedItem
- read_metadata_batch(symbols: List[str | ReadInfoRequest]) List[VersionedItem] [source]
Reads the metadata of multiple symbols.
- Parameters:
symbols (List[Union[str, ReadInfoRequest]]) – List of symbols to read.
- Returns:
A list of the read results, whose i-th element corresponds to the i-th element of the
symbols
parameter. A VersionedItem object with the metadata field set as None will be returned if the requested version of thesymbol exists but there is no metadata
A None object will be returned if the requested version of the symbol does not exist
- Return type:
List[VersionedItem]
See also
- write_metadata(symbol: str, metadata: Any) VersionedItem [source]
Write metadata under the specified symbol name to this library. The data will remain unchanged. A new version will be created.
If the symbol is missing, it causes a write with empty data (None, pickled, can’t append) and the supplied metadata.
This method should be faster than write as it involves no data segment read/write operations.
- Parameters:
symbol – Symbol name for the item
metadata – Metadata to persist along with the symbol
- Returns:
Structure containing metadata and version number of the affected symbol in the store.
- Return type:
VersionedItem
- snapshot(snapshot_name: str, metadata: Any | None = None, skip_symbols: List[str] | None = None, versions: Dict[str, int] | None = None) None [source]
Creates a named snapshot of the data within a library.
By default, the latest version of every symbol that has not been deleted will be contained within the snapshot. You can change this behaviour with either
versions
(an allow-list) or withskip_symbols
(a deny-list). Concurrent writes with prune previous versions set while the snapshot is being taken can potentially lead to corruption of the affected symbols in the snapshot.The symbols and versions contained within the snapshot will persist regardless of new symbols and versions being written to the library afterwards. If a version or symbol referenced in a snapshot is deleted then the underlying data will be preserved to ensure the snapshot is still accessible. Only once all referencing snapshots have been removed will the underlying data be removed as well.
At most one of
skip_symbols
andversions
may be truthy.- Parameters:
snapshot_name – Name of the snapshot.
metadata (Any, default=None) – Optional metadata to persist along with the snapshot.
skip_symbols (List[str], default=None) – Optional symbols to be excluded from the snapshot.
versions (Dict[str, int], default=None) – Optional dictionary of versions of symbols to snapshot. For example versions={“a”: 2, “b”: 3} will snapshot version 2 of symbol “a” and version 3 of symbol “b”.
- Raises:
InternalException – If a snapshot already exists with
snapshot_name
. You must explicitly delete the pre-existing snapshot.
- delete(symbol: str, versions: int | Iterable[int] | None = None)[source]
Delete all versions of the symbol from the library, unless
version
is specified, in which case only those versions are deleted.This may not actually delete the underlying data if a snapshot still references the version. See snapshot for more detail.
Note that this may require data to be removed from the underlying storage which can be slow.
If no symbol called
symbol
exists then this is a no-op. In particular this method does not raise in this case.- Parameters:
symbol – Symbol to delete.
versions – Version or versions of symbol to delete. If
None
then all versions will be deleted.
- prune_previous_versions(symbol)[source]
Removes all (non-snapshotted) versions from the database for the given symbol, except the latest.
- Parameters:
symbol (str) – Symbol name to prune.
- delete_data_in_range(symbol: str, date_range: Tuple[Timestamp | datetime | date | None, Timestamp | datetime | date | None])[source]
Delete data within the given date range, creating a new version of
symbol
.The existing symbol version must be timeseries-indexed.
- Parameters:
symbol – Symbol name.
date_range – The date range in which to delete data. Leaving any part of the tuple as None leaves that part of the range open ended.
Examples
>>> df = pd.DataFrame({"column": [5, 6, 7, 8]}, index=pd.date_range(start="1/1/2018", end="1/4/2018")) >>> lib.write("symbol", df) >>> lib.delete_data_in_range("symbol", date_range=(datetime.datetime(2018, 1, 1), datetime.datetime(2018, 1, 2))) >>> lib["symbol"].version 1 >>> lib["symbol"].data column 2018-01-03 7 2018-01-04 8
- delete_snapshot(snapshot_name: str) None [source]
Delete a named snapshot. This may take time if the given snapshot is the last reference to the underlying symbol(s) as the underlying data will be removed as well.
- Parameters:
snapshot_name – The snapshot name to delete.
- Raises:
Exception – If the named snapshot does not exist.
- list_symbols(snapshot_name: str | None = None) List[str] [source]
Return the symbols in this library.
- Parameters:
snapshot_name – Return the symbols available under the snapshot. If None then considers symbols that are live in the library as of the current time.
- Returns:
Symbols in the library.
- Return type:
List[str]
- has_symbol(symbol: str, as_of: int | str | datetime | None = None) bool [source]
Whether this library contains the given symbol.
- Parameters:
symbol – Symbol name for the item
as_of (AsOf, default=None) – Return the data as it was as_of the point in time. See read for more documentation. If absent then considers symbols that are live in the library as of the current time.
- Returns:
True if the symbol is in the library, False otherwise.
- Return type:
bool
Examples
>>> lib.write("symbol", pd.DataFrame()) >>> lib.has_symbol("symbol") True >>> lib.has_symbol("another_symbol") False
The __contains__ operator also checks whether a symbol exists in this library as of now:
>>> "symbol" in lib True >>> "another_symbol" in lib False
- list_snapshots() Dict[str, Any] [source]
List the snapshots in the library.
- Returns:
Snapshots in the library. Keys are snapshot names, values are metadata associated with that snapshot.
- Return type:
Dict[str, Any]
- list_versions(symbol: str | None = None, snapshot: str | None = None, latest_only: bool = False, skip_snapshots: bool = False) Dict[SymbolVersion, VersionInfo] [source]
Get the versions in this library, filtered by the passed in parameters.
- Parameters:
symbol – Symbol to return versions for. If None returns versions across all symbols in the library.
snapshot – Only return the versions contained in the named snapshot.
latest_only (bool, default=False) – Only include the latest version for each returned symbol.
skip_snapshots (bool, default=False) – Don’t populate version list with snapshot information. Can improve performance significantly if there are many snapshots.
- Returns:
Dictionary describing the version for each symbol-version pair in the library. Since symbol version is a (named) tuple you can index in to the dictionary simply as shown in the examples below.
- Return type:
Dict[SymbolVersion, VersionInfo]
Examples
>>> df = pd.DataFrame() >>> lib.write("symbol", df, metadata=10) >>> lib.write("symbol", df, metadata=11, prune_previous_versions=False) >>> lib.snapshot("snapshot") >>> lib.write("symbol", df, metadata=12, prune_previous_versions=False) >>> lib.delete("symbol", versions=(1, 2)) >>> versions = lib.list_versions("symbol") >>> versions["symbol", 1].deleted True >>> versions["symbol", 1].snapshots ["my_snap"]
- head(symbol: str, n: int = 5, as_of: int | str | datetime | None = None, columns: List[str] | None = None) VersionedItem [source]
Read the first n rows of data for the named symbol. If n is negative, return all rows except the last n rows.
- Parameters:
symbol – Symbol name.
n (int, default=5) – Number of rows to select if non-negative, otherwise number of rows to exclude.
as_of (AsOf, default=None) – See documentation on read.
columns – See documentation on read.
- Return type:
VersionedItem object that contains a .data and .metadata element.
- tail(symbol: str, n: int = 5, as_of: str | int | None = None, columns: List[str] | None = None) VersionedItem [source]
Read the last n rows of data for the named symbol. If n is negative, return all rows except the first n rows.
- Parameters:
symbol – Symbol name.
n (int, default=5) – Number of rows to select if non-negative, otherwise number of rows to exclude.
as_of (AsOf, default=None) – See documentation on read.
columns – See documentation on read.
- Return type:
VersionedItem object that contains a .data and .metadata element.
- get_description(symbol: str, as_of: int | str | datetime | None = None) SymbolDescription [source]
Returns descriptive data for
symbol
.- Parameters:
symbol – Symbol name.
as_of (AsOf, default=None) – See documentation on read.
- Returns:
Named tuple containing the descriptive data.
- Return type:
See also
SymbolDescription
For documentation on each field.
- get_description_batch(symbols: List[str | ReadInfoRequest]) List[SymbolDescription] [source]
Returns descriptive data for a list of
symbols
.- Parameters:
symbols (List[Union[str, ReadInfoRequest]]) – List of symbols to read. Params columns, date_range and query_builder from ReadInfoRequest are not used
- Returns:
A list of the descriptive data, whose i-th element corresponds to the i-th element of the
symbols
parameter.- Return type:
List[SymbolDescription]
See also
SymbolDescription
For documentation on each field.
- is_symbol_fragmented(symbol: str, segment_size: int | None = None) bool [source]
Check whether the number of segments that would be reduced by compaction is more than or equal to the value specified by the configuration option “SymbolDataCompact.SegmentCount” (defaults to 100).
- Parameters:
symbol (str) – Symbol name.
segment_size (int) – Target for maximum no. of rows per segment, after compaction. If parameter is not provided, library option for segments’s maximum row size will be used
Notes
Config map setting - SymbolDataCompact.SegmentCount will be replaced by a library setting in the future. This API will allow overriding the setting as well.
- Return type:
bool
- defragment_symbol_data(symbol: str, segment_size: int | None = None) VersionedItem [source]
Compacts fragmented segments by merging row-sliced segments (https://docs.arcticdb.io/technical/on_disk_storage/#data-layer). This method calls is_symbol_fragmented to determine whether to proceed with the defragmentation operation.
CAUTION - Please note that a major restriction of this method at present is that any column slicing present on the data will be removed in the new version created as a result of this method. As a result, if the impacted symbol has more than 127 columns (default value), the performance of selecting individual columns of the symbol (by using the columns parameter) may be negatively impacted in the defragmented version. If your symbol has less than 127 columns this caveat does not apply. For more information, please see columns_per_segment here:
https://docs.arcticdb.io/api/arcticdb/arcticdb.LibraryOptions
- Parameters:
symbol (str) – Symbol name.
segment_size (int) – Target for maximum no. of rows per segment, after compaction. If parameter is not provided, library option - “segment_row_size” will be used Note that no. of rows per segment, after compaction, may exceed the target. It is for achieving smallest no. of segment after compaction. Please refer to below example for further explantion.
- Returns:
Structure containing metadata and version number of the defragmented symbol in the store.
- Return type:
VersionedItem
- Raises:
1002 ErrorCategory.INTERNAL:E_ASSERTION_FAILURE – If is_symbol_fragmented returns false.
2001 ErrorCategory.NORMALIZATION:E_UNIMPLEMENTED_INPUT_TYPE – If library option - “bucketize_dynamic” is ON
Examples
>>> lib.write("symbol", pd.DataFrame({"A": [0]}, index=[pd.Timestamp(0)])) >>> lib.append("symbol", pd.DataFrame({"A": [1, 2]}, index=[pd.Timestamp(1), pd.Timestamp(2)])) >>> lib.append("symbol", pd.DataFrame({"A": [3]}, index=[pd.Timestamp(3)])) >>> lib.read_index(sym) start_index end_index version_id stream_id creation_ts content_hash index_type key_type start_col end_col start_row end_row 1970-01-01 00:00:00.000000000 1970-01-01 00:00:00.000000001 20 b'sym' 1678974096622685727 6872717287607530038 84 2 1 2 0 1 1970-01-01 00:00:00.000000001 1970-01-01 00:00:00.000000003 21 b'sym' 1678974096931527858 12345256156783683504 84 2 1 2 1 3 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000004 22 b'sym' 1678974096970045987 7952936283266921920 84 2 1 2 3 4 >>> lib.version_store.defragment_symbol_data("symbol", 2) >>> lib.read_index(sym) # Returns two segments rather than three as a result of the defragmentation operation start_index end_index version_id stream_id creation_ts content_hash index_type key_type start_col end_col start_row end_row 1970-01-01 00:00:00.000000000 1970-01-01 00:00:00.000000003 23 b'sym' 1678974097067271451 5576804837479525884 84 2 1 2 0 3 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000004 23 b'sym' 1678974097067427062 7952936283266921920 84 2 1 2 3 4
Notes
Config map setting - SymbolDataCompact.SegmentCount will be replaced by a library setting in the future. This API will allow overriding the setting as well.
- property name
The name of this library.