liboxen 0.46.3

Oxen is a fast, unstructured data version control, to help version large machine learning datasets written in Rust.
Documentation

🐂 Oxen

Create a world where everyone can contribute to an Artificial General Intelligence, starting with the data.

🌾 What is Oxen?

Oxen at its core is a data version control library, written in Rust. Its goals are to be fast, reliable, and easy to use. It's designed to be used in a variety of ways, from a simple command line tool, to a remote server to sync to, to integrations into other ecosystems such as python.

📚 Documentation

The documentation for the Oxen.ai tool chain can be found here.

✅ TODO

  • Configurable storage backends
    • Local filesystem
    • S3
    • GCS
    • Azure
    • Backblaze
  • Block level deduplication

🔨 Build & Run

Install Prerequisites

See the prerequisites section of the main readme to install the needed prerequisites.

Build

Build the entire workspace (CLI, server, and library):

cargo build --workspace

Or build a specific crate:

cargo build -p oxen-server
cargo build -p oxen-cli
cargo build -p liboxen

If on intel mac, you may need to build with the following

rustup target install x86_64-apple-darwin
cargo build --workspace --target x86_64-apple-darwin

If on Windows, you may need to add the following directories to the 'INCLUDE' environment variable

"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include"

"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.27023\include"

"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\lib\clang\12.0.0\include"

These are example paths and will vary between machines. If you install 'C++ Clang tools for Windows' through Microsoft Visual Studio Build Tools, the directories can be located from the Visual Studio installation under 'BuildTools\VC\Tools'

Speed up the build process

Linux

Note: Rust 1.90+ on x86_64-unknown-linux-gnu already uses an optimized linker by default. For other variants of Linux, the instructions below may speed things up.

On Linux, you can use the mold linker to speed up builds.

Then create .cargo/config.toml in your Oxen repo root with the following content:

[target.x86_64-unknown-linux-gnu]
rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/ld64.mold"]

macOS with Apple Silicon

For macOS with Apple Silicon, you can use the lld linker.

brew install llvm

Then create .cargo/config.toml in your Oxen repo root with the following:

[target.aarch64-apple-darwin]
rustflags = [ "-C", "link-arg=-fuse-ld=/opt/homebrew/opt/llvm/bin/ld64.lld", ]

Run

CLI

To run Oxen from the command line, add the oxen-rust/target/debug directory to the 'PATH' environment variable

export PATH="$PATH:/path/to/Oxen/oxen-rust/target/debug"

On Windows, you can use

$env:PATH += ";/path/to/Oxen/oxen-rust/target/debug"

Initialize a new repository or clone an existing one

oxen init
oxen clone https://hub.oxen.ai/namespace/repository

This will create the .oxen dir in your current directory and allow you to run Oxen CLI commands

oxen status
oxen add images/
oxen commit -m "added images"
oxen push origin main

Oxen Server

To run a local Oxen Server, generate a config file and token to authenticate the user

./target/debug/oxen-server add-user --email ox@oxen.ai --name Ox --output user_config.toml

Copy the config to the default locations

mkdir ~/.oxen
mv user_config.toml ~/.oxen/user_config.toml
cp ~/.oxen/user_config.toml data/test/config/user_config.toml

Set where you want the data to be synced to. The default sync directory is ./data/ to change it set the SYNC_DIR environment variable to a path.

export SYNC_DIR=/path/to/sync/dir

You can also create a .env.local file in the /crates/server directory which can contain the SYNC_DIR variable to avoid setting it every time you run the server.

Run the server

cargo run -p oxen-server -- start

Or run the compiled binary directly:

./target/debug/oxen-server start

To run the server with live reload, use bacon:

cargo install --locked bacon

Then run the server like this

bacon server

Nix Flake

If you have Nix installed you can use the flake to build and run the server. This will automatically install and configure the required build toolchain dependencies for Linux & macOS.

nix build .#oxen-server
nix build .#oxen-cli
nix build .#liboxen
nix run .#oxen-server -- start
nix run .#oxen-cli -- init

To develop with the standard rust toolchain in a Nix dev shell:

nix develop -c $SHELL
cargo build --workspace
cargo run -p oxen-server -- start
cargo run -p oxen-cli -- init

The flake also provides derivations to build OCI (Docker) images with the minimal set of dependencies required to build and run oxen & oxen-server.

nix build .#oci-oxen-server
nix build .#oci-oxen-cli

This will export the OCI image and can be loaded with:

docker load -i result

Unit & Integration Tests

Manual Test Setup

Here are the steps to manually configure and run tests (see also the Automatic Test Setup section). Make sure your user is configured and server is running on the default port and host, by following these setup steps:

# Configure a user
mkdir -p data/test/{runs,config}
./target/debug/oxen-server add-user --email ox@oxen.ai --name Ox --output data/test/config/user_config.toml
# Start the oxen-server
./target/debug/oxen-server start

Note: tests open up a lot of file handles, so limit num test threads if running everything.

You can also increase the number of open files your system allows ulimit before running tests:

ulimit -n 10240

Then you can run the tests with the cargo test or cargo nextest (preferred) directly. To run all tests with the default number of threads:

cargo test --workspace -- --test-threads=$(getconf _NPROCESSORS_ONLN)

Automatic Test Setup

You can use the following script to run tests. It will set up config files, build and run an oxen-server, run the tests against it, and shutdown the server. Any arguments passed to the script will be passed to cargo nextest run, so you can use it to run specific tests or set test threads.

scripts/test-rust.sh

It can be faster (in terms of compilation and runtime) to run a specific test. To run a specific library test:

scripts/test-rust.sh --lib test_get_metadata_text_readme

To run with all debug output and run a specific test

env RUST_LOG=warn,liboxen=debug,integration_test=debug scripts/test-rust.sh --no-capture test_command_push_clone_pull_push

To explicitly set the port for the oxen-server used in tests, set OXEN_PORT:

env OXEN_PORT=4000 scripts/test-rust.sh

Oxen Server

Structure

Remote repositories have the same internal structure as local ones, with the caveat that all the data is in the .oxen dir and not duplicated into a "local workspace".

APIs

Server defaults to localhost 3000

set SERVER 0.0.0.0:3000

You can grab your auth token from the config file above (~/.oxen/user_config.toml)

set TOKEN <YOUR_TOKEN>

List Repositories

curl -H "Authorization: Bearer $TOKEN" "http://$SERVER/api/repos"

Create Repository

curl -H "Authorization: Bearer $TOKEN" -X POST -d '{"name": "MyRepo"}' "http://$SERVER/api/repos"

Docker

Create the docker image

docker build -t oxen/server:0.46.3 .

Run a container on port 3000 with a local filesystem mounted from /var/oxen/data on the host to /var/oxen/data in the container.

docker run -d -v /var/oxen/data:/var/oxen/data -p 3000:3001 --name oxen oxen/server:0.46.3

Or use docker compose

docker-compose up -d reverse-proxy
docker-compose up -d --scale oxen=4 --no-recreate

Run Benchmark

we use criterion to handle benchmarks.

cargo bench --workspace

To save baseline you can run

cargo bench --workspace -- --save-baseline bench

Which would then store the benchmark under target/criterion/add

Enable ffmpeg

To enable thumbnailing for videos, you will have to build with ffmpeg enabled

brew install ffmpeg@7
cargo build --workspace --all-features

Or for a specific crate:

cargo build -p oxen-server --features liboxen/ffmpeg