Skip to content

Contribution Licensing

Since this project is distributed under the terms of the BSL license, contributions that you make are licensed under the same terms. For us to be able to accept your contributions, we will need explicit confirmation from you that you are able and willing to provide them under these terms, and the mechanism we use to do this is the ArcticDB Individual Contributor License Agreement.

Individuals - To participate under these terms, please include the following line as the last line of the commit message for each commit in your contribution. You must use your real name (no pseudonyms, and no anonymous contributions).

Signed-Off By: Random J. Developer <[email protected]>. By including this sign-off line I agree to the terms of the Contributor License Agreement.

Corporations - For corporations who wish to make contributions to ArcticDB, please contact [email protected] and we will arrange for the CLA to be sent to the signing authority within your corporation.

Docker Quickstart

This quickstart builds a release using build dependencies from vcpkg. ArcticDB releases on PyPi use vcpkg dependencies in the manner as described below.

Note the below instructions will build a Linux X86_64 release.

1) Start the ArcticDB build docker image

Run in a Linux terminal:

docker pull ghcr.io/man-group/cibuildwheel_manylinux:2.12.1-3a897
docker run -it ghcr.io/man-group/cibuildwheel_manylinux:2.12.1-3a897

:warning: The below instructions do not have to be run in the provided docker image. They can be run against any Python installation on any Linux distribution as long as the basic build dependencies are available.

If running outside the provided docker image, please change /opt/python/cp39-cp39/bin/python3 in the examples below to an appropriate path for Python.

2) Check out ArcticDB including submodules

cd
git clone https://github.com/man-group/ArcticDB.git
cd ArcticDB
git submodule init && git submodule update

3) Kick off the build

MY_PYTHON=/opt/python/cp39-cp39/bin/python3  # change if outside docker container/building against a different python
$MY_PYTHON -m pip install -U pip setuptools wheel grpcio-tools
ARCTIC_CMAKE_PRESET=skip $MY_PYTHON setup.py develop
# Change the below Python_EXECUTABLE value to build against a different Python version
cmake -DPython_EXECUTABLE=$MY_PYTHON -DTEST=off --preset linux-debug cpp
pushd cpp
cmake --build --preset linux-debug
popd

4) Run ArcticDB

Ensure the below is run in the Git project root:

# PYTHONPATH first part = Python module, second part compiled C++ binary
PYTHONPATH=`pwd`/python:`pwd`/cpp/out/linux-debug-build/arcticdb/ $MY_PYTHON

Now, inside the Python shell:

from arcticdb import Arctic

Rather than setting the PYTHONPATH environment variable, you could install the appropriate paths into your Python environment by running (note that this will invoke the build tooling so will compile any changed files since the last compilation):

$MY_PYTHON -m pip install -ve .

Note that as this will copy the binary to your Python installation this will have to be run after each and every change of a C++ file.

mamba and conda-forge Quickstart

This quickstart uses build dependencies from conda-forge. It is a pre-requisite for releasing ArcticDB on conda-forge.

⚠️ At the time of writing, installing ArcticDB with this setup under Windows is not possible since no distribution of folly for Windows is not available on conda-forge. For tracking progress on packaging folly for Windows on conda-forge, see: conda-forge/folly-feedstock#98

  • Install mamba
  • Create the arcticdb environment from its specification (environment_unix.yml):
mamba env create -f environment_unix.yml
  • Activate the arcticdb environment (you will need to do this for every new shell session):
mamba activate arcticdb
  • Build and install ArcticDB in the arcticdb environment using dependencies installed in this environement: We recommend using the editable installation for development:
ARCTICDB_USING_CONDA=1 python -m pip install --verbose --editable .
  • Use ArcticDB from Python:
from arcticdb import Arctic

FAQ

How do I build against different Python versions?

Run cmake (configure, not build) with either:

  1. A different version of Python as the first version of Python on your PATH or...
  2. Point the Python_EXECUTABLE CMake variable to a different Python binary

Note that to build the ArcticDB C++ tests you must have the Python static library available in your installation!

How do I run the Python tests?

See running tests below.

How do I run the C++ tests?

See running tests below.

How do I specify how many cores to build using?

This is determined auto-magically by CMake at build time, but can be manually set by passing in --parallel <num cores> into the build command.

Detailed Build Information

Docker Image Construction

The above docker image is built from ManyLinux. Build script is located here.

GitHub output here.

We recommend you use this image for compilation and testing!

Setting up Linux

The codebase and build system can work with any reasonably recent Linux distribution with at least GCC 8 (10+ recommended) and CMake 3.12 (these instructions assume 3.21+).

A development install of Python 3.6+ (with libpython.a or .so and full headers) is also necessary. See pybind11 configuration.

We require a Mongo executable for a couple of Python tests on Linux. You can check whether you have it with mongod --version.

Search the internet for "mongo installation Linux" for instructions for your distro if you do not already have mongod available.

Dependencies by distro

Distro Versions reported to work Packages
Ubuntu 20.04, 22.04 build-essential g++-10 libpcre3-dev libsasl2-dev libsodium-dev libkrb5-dev libcurl4-openssl-dev python3-dev
Centos 7 devtoolset-10-gcc-c++ openssl-devel cyrus-sasl-devel devtoolset-10-libatomic-devel libcurl-devel python3-devel

Setting up Windows

We recommend using Visual Studio 2022 (or later) to install the compiler (MSVC v142 or newer) and tools (Windows SDK, CMake, Python).

The Python that comes with Visual Studio is sufficient for creating release builds, but for debug builds, you will have to separately download from Python.org.

Running Python tests

With python pointing to a Python interpeter with ArcticDB installed/on the PYTHON_PATH:

python -m pip install arcticdb[Testing]
python -m pytest python/tests

Running C++ tests

Configure ArcticDB with TEST=on (default):

cmake -DPython_EXECUTABLE=<path to python> --preset linux-debug cpp

Note that <path to python> must point to a Python that is compatible with Development.Embed. This will probably be the result of installing python3-devel from your dependency manager.

Inside the provided docker image, python3-devel resolves to Python 3.6 installed at /usr/bin/python3, so the resulting command will be:

cmake -DPython_EXECUTABLE=/usr/bin/python3 -DTEST=ON --preset linux-debug cpp

Then invoke the CMake build as normal and run the compiled test binary.

CIBuildWheel

Our source repo works with CIBuildWheel which runs the compilation and tests against all supported Python versions in isolated environments. Please follow their documentation.

Configurations

CMake presets

To make it easier to set and share all the environment variables, config and commands, we recommend using the CMake presets feature.

Recent versions of some popular C++ IDEs support reading/importing these presets: * Visual Studio & Code * CLion

And it's equally easy to use on the command line (see example).

We already ship a CMakePresets.json in the cpp directory, which is used by our builds. You can add a CMakeUserPresets.json in the same directory for local overrides. Inheritance is supported.

If you're working on Linux but not using our Docker image, you may want to create a preset with these cacheVariables: * CMAKE_MAKE_PROGRAM - make or ninja should work * CMAKE_C_COMPILER and CMAKE_CXX_COMPILER - If your preferred compiler is not cc and cxx

More examples:

Windows Preset to specify a Python version
{
  "version": 3,
  "configurePresets": [
    {
      "name": "alt-vcpkg-debug:py3.10",
      "inherits": "windows-cl-debug",
      "cacheVariables": {
        "Python_ROOT_DIR": "C:\\Program Files\\Python310"
      },
      "environment": {
        "PATH": "C:\\Users\\me\\AppData\\Roaming\\Python\\Python310\\Scripts;C:\\Program Files\\Python310;$penv{PATH}",
        "PYTHONPATH": "C:\\Program Files\\Python310\\Lib;C:\\Users\\me\\AppData\\Roaming\\Python\\Python310\\site-packages"
      }
    }
  ],
  "buildPresets": [
    {
      "name": "alt-vcpkg-debug:py3.10",
      "configurePreset": "alt-vcpkg-debug:py3.10",
      "inheritConfigureEnvironment": true
    }
  ]
}

vcpkg caching

We use vcpkg to manage the C++ dependencies.

Compiling the dependencies uses a lot of disk space. Once CMake configuration is done, you can remove the cpp\vcpkg\buildtrees folder.

You may also want to configure some caches: * Binary caching * Asset caching

pybind11 configuration

We augmented pybind11's Python discovery with our own PythonUtils to improve diagnostics. Please pay attention to warning messages from PythonUtils.cmake in the CMake output which highlights any configuration issues with Python.

We compile against the first python on the PATH by default.

To override that, use one of the following CMake variables*:

  • Python_ROOT_DIR - The common path "prefix" for a particular Python install. Usually, the python executable is in the same directory or the bin subdirectory. This directory should also contain the include and lib(rary) subdirectories.
    E.g. /usr for a system-wide install on *nix; /opt/pythonXX for a locally-managed Python install; /home/user/my_virtualenv for a virtual env; C:\Program Files\PythonXX for a Windows Python install

  • Python_EXECUTABLE - The path to the Python executable. (CMake 3.15+) CMake will try to extract the include and library paths by running this program. This differs from the default behaviour of FindPython.

(* Note CMake variables are set with -D on the CMake command line or with the cacheVariables key in CMake*Presets.json. The names are case-sensitive.

(Only) Python_ROOT_DIR can also be set as an environment variable. Setting the others in the environment might have no effect.)