atelier_data 0.0.15

Data Artifacts and I/O for the atelier-rs engine
docs.rs failed to build atelier_data-0.0.15
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

atelier-data

Market data infrastructure for the atelier-rs trading engine.

This crate provides everything needed to connect to cryptocurrency exchanges, normalise their heterogeneous WebSocket feeds into a common data model, synchronise events onto a uniform time grid, and persist the result to Apache Parquet files.

Core Data Types

Off-chain activity (market microstructure):

Type Description
Orderbook Full-depth limit order book snapshot (bid/ask levels)
OrderbookDelta Incremental order book maintained via NormalizedDelta updates
Trade Public trade execution (price, size, side, timestamp)
Liquidation Forced liquidation event
FundingRate Perpetual futures funding rate observation
OpenInterest Aggregate open interest snapshot

Composed types:

Type Description
MarketSnapshot Time-aligned bundle of all market data for one grid period
MarketAggregate 15-scalar feature vector derived from a MarketSnapshot

Exchange Sources

Source Kind API Order Books Public Trades Liquidations Funding Rates Open Interest
Bybit CEX WSS YES / YES YES / YES YES / YES YES / YES YES / YES
Coinbase CEX WSS YES / YES YES / YES
Kraken CEX WSS YES / YES YES / YES

Format: Implemented / Tested. Dashes indicate the exchange does not expose the data type on its spot/linear WebSocket API.

Workers

Two worker types handle end-to-end data collection:

DataWorker — raw event ingestion without synchronisation. Connects to a live exchange WebSocket feed, decodes events, and delivers them through a pluggable OutputSink pipeline. Configuration is driven by a TOML manifest (DataWorkerManifest). Handles reconnection, backoff, health monitoring, and gap detection automatically.

MarketWorker — synchronised market snapshots. Extends DataWorker's ingestion with a MarketSynchronizer that bins heterogeneous events onto a uniform nanosecond grid, producing MarketSnapshot objects at each tick. Multiple ClockMode strategies are supported: OrderbookDriven, TradeDriven, LiquidationDriven, and ExternalClock. Snapshots are delivered through the same OutputSink pipeline and can be flushed to Parquet automatically.

Output Sinks

The OutputSink trait defines where worker output goes. Multiple sinks run simultaneously via OutputSinkSet (fan-out):

Sink Status Description
ChannelSink Working Wraps TopicRegistry broadcast channels for pub/sub
TerminalSink Working Debug/tracing terminal output
ParquetSink Working Buffers MarketSnapshots, decomposes and flushes to per-datatype Parquet files

Parquet Persistence

Requires --features parquet. All five data types support read and write:

Data Type Writer Reader
Orderbooks write_ob_parquet read_ob_parquet
Trades write_trades_parquet_timestamped read_trades_parquet
Liquidations write_liquidations_parquet_timestamped read_liquidations_parquet
Funding Rates write_funding_parquet_timestamped read_funding_parquet
Open Interest write_oi_parquet_timestamped read_oi_parquet

Filename Convention

All timestamped writers produce files following this pattern:

{SYMBOL}_{DATATYPE}_{MODE}_{TIMESTAMP}.parquet

Where MODE is "sync" for grid-aligned data or "raw" for unprocessed captures. Symbols containing / (e.g. Kraken's BTC/USDT) are sanitised to - in the filename (BTC-USDT) while the Parquet data retains the original symbol string. Examples:

BTCUSDT_ob_sync_20260226_153000.123.parquet
ETHUSDT_trades_raw_20260226_160000.456.parquet
BTC-USDT_ob_sync_20260226_153000.123.parquet

Files are organised into subdirectories per data type: orderbooks/, trades/, liquidations/, fundings/, open_interests/.

Feature Flags

Flag Effect
parquet Enables Apache Parquet I/O (adds arrow + parquet deps)
torch Enables tch-based tensor conversion in the datasets module

Examples

Example Description Command
run_data_worker Raw event ingestion via DataWorker cargo run -p atelier_data --example run_data_worker -- --config <path>
run_market_worker Synchronised snapshots to Parquet via MarketWorker cargo run -p atelier_data --example run_market_worker --features parquet -- --config <path>
read_market_worker Read Parquet files and print per-symbol stats cargo run -p atelier_data --example read_market_worker --features parquet -- --dir <path>
bybit_markets Bybit market snapshot collection (standalone) cargo run -p atelier_data --example bybit_markets --features parquet -- --config <path>
coinbase_markets Coinbase market snapshot collection cargo run -p atelier_data --example coinbase_markets --features parquet -- --config <path>
kraken_markets Kraken market snapshot collection cargo run -p atelier_data --example kraken_markets --features parquet -- --config <path>
market_load Load and verify most recent Parquet files cargo run -p atelier_data --example market_load --features parquet -- --config <path>
market_fetch Multi-exchange raw stream collector (Bybit/Coinbase/Kraken) cargo run -p atelier_data --example market_fetch --features parquet
multi_sync_workers Multi-worker manifest parser (stub) cargo run -p atelier_data --example multi_sync_workers -- --config <path>

atelier-data is a member of the atelier-rs workspace:

Development resources: