atelier_data 0.0.15

Data Artifacts and I/O for the atelier-rs engine
# atelier-data

Market data infrastructure for the **atelier-rs** trading engine.

This crate provides everything needed to connect to cryptocurrency exchanges,
normalise their heterogeneous WebSocket feeds into a common data model,
synchronise events onto a uniform time grid, and persist the result to
Apache Parquet files.

## Core Data Types

**Off-chain activity** (market microstructure):

| Type | Description |
|------|-------------|
| `Orderbook` | Full-depth limit order book snapshot (bid/ask levels) |
| `OrderbookDelta` | Incremental order book maintained via `NormalizedDelta` updates |
| `Trade` | Public trade execution (price, size, side, timestamp) |
| `Liquidation` | Forced liquidation event |
| `FundingRate` | Perpetual futures funding rate observation |
| `OpenInterest` | Aggregate open interest snapshot |

**Composed types:**

| Type | Description |
|------|-------------|
| `MarketSnapshot` | Time-aligned bundle of all market data for one grid period |
| `MarketAggregate` | 15-scalar feature vector derived from a `MarketSnapshot` |

## Exchange Sources

| Source | Kind | API | Order Books | Public Trades | Liquidations | Funding Rates | Open Interest |
|--------|------|-----|-------------|---------------|--------------|---------------|---------------|
| Bybit | CEX | WSS | YES / YES | YES / YES | YES / YES | YES / YES | YES / YES |
| Coinbase | CEX | WSS | YES / YES | YES / YES ||||
| Kraken | CEX | WSS | YES / YES | YES / YES ||||

*Format: Implemented / Tested. Dashes indicate the exchange does not expose
the data type on its spot/linear WebSocket API.*

## Workers

Two worker types handle end-to-end data collection:

**DataWorker** — raw event ingestion without synchronisation. Connects to a
live exchange WebSocket feed, decodes events, and delivers them through a
pluggable `OutputSink` pipeline. Configuration is driven by a TOML manifest
(`DataWorkerManifest`). Handles reconnection, backoff, health monitoring,
and gap detection automatically.

**MarketWorker** — synchronised market snapshots. Extends `DataWorker`'s
ingestion with a `MarketSynchronizer` that bins heterogeneous events onto
a uniform nanosecond grid, producing `MarketSnapshot` objects at each tick.
Multiple `ClockMode` strategies are supported: `OrderbookDriven`,
`TradeDriven`, `LiquidationDriven`, and `ExternalClock`. Snapshots are
delivered through the same `OutputSink` pipeline and can be flushed to
Parquet automatically.

## Output Sinks

The `OutputSink` trait defines where worker output goes. Multiple sinks
run simultaneously via `OutputSinkSet` (fan-out):

| Sink | Status | Description |
|------|--------|-------------|
| `ChannelSink` | Working | Wraps `TopicRegistry` broadcast channels for pub/sub |
| `TerminalSink` | Working | Debug/tracing terminal output |
| `ParquetSink` | Working | Buffers `MarketSnapshot`s, decomposes and flushes to per-datatype Parquet files |

## Parquet Persistence

Requires `--features parquet`. All five data types support read and write:

| Data Type | Writer | Reader |
|-----------|--------|--------|
| Orderbooks | `write_ob_parquet` | `read_ob_parquet` |
| Trades | `write_trades_parquet_timestamped` | `read_trades_parquet` |
| Liquidations | `write_liquidations_parquet_timestamped` | `read_liquidations_parquet` |
| Funding Rates | `write_funding_parquet_timestamped` | `read_funding_parquet` |
| Open Interest | `write_oi_parquet_timestamped` | `read_oi_parquet` |

### Filename Convention

All timestamped writers produce files following this pattern:

```
{SYMBOL}_{DATATYPE}_{MODE}_{TIMESTAMP}.parquet
```

Where `MODE` is `"sync"` for grid-aligned data or `"raw"` for unprocessed
captures. Symbols containing `/` (e.g. Kraken's `BTC/USDT`) are sanitised
to `-` in the filename (`BTC-USDT`) while the Parquet data retains the
original symbol string. Examples:

```
BTCUSDT_ob_sync_20260226_153000.123.parquet
ETHUSDT_trades_raw_20260226_160000.456.parquet
BTC-USDT_ob_sync_20260226_153000.123.parquet
```

Files are organised into subdirectories per data type: `orderbooks/`,
`trades/`, `liquidations/`, `fundings/`, `open_interests/`.

## Feature Flags

| Flag | Effect |
|------|--------|
| `parquet` | Enables Apache Parquet I/O (adds `arrow` + `parquet` deps) |
| `torch` | Enables `tch`-based tensor conversion in the `datasets` module |

## Examples

| Example | Description | Command |
|---------|-------------|---------|
| `run_data_worker` | Raw event ingestion via DataWorker | `cargo run -p atelier_data --example run_data_worker -- --config <path>` |
| `run_market_worker` | Synchronised snapshots to Parquet via MarketWorker | `cargo run -p atelier_data --example run_market_worker --features parquet -- --config <path>` |
| `read_market_worker` | Read Parquet files and print per-symbol stats | `cargo run -p atelier_data --example read_market_worker --features parquet -- --dir <path>` |
| `bybit_markets` | Bybit market snapshot collection (standalone) | `cargo run -p atelier_data --example bybit_markets --features parquet -- --config <path>` |
| `coinbase_markets` | Coinbase market snapshot collection | `cargo run -p atelier_data --example coinbase_markets --features parquet -- --config <path>` |
| `kraken_markets` | Kraken market snapshot collection | `cargo run -p atelier_data --example kraken_markets --features parquet -- --config <path>` |
| `market_load` | Load and verify most recent Parquet files | `cargo run -p atelier_data --example market_load --features parquet -- --config <path>` |
| `market_fetch` | Multi-exchange raw stream collector (Bybit/Coinbase/Kraken) | `cargo run -p atelier_data --example market_fetch --features parquet` |
| `multi_sync_workers` | Multi-worker manifest parser (stub) | `cargo run -p atelier_data --example multi_sync_workers -- --config <path>` |

---

**`atelier-data`** is a member of the [atelier-rs](https://github.com/iteralabs/atelier-rs) workspace:

- [atelier-engine]https://crates.io/crates/atelier-engine
- [atelier-quant]https://crates.io/crates/atelier-quant
- [atelier-retro]https://crates.io/crates/atelier-retro
- [atelier-rs]https://crates.io/crates/atelier-rs

Development resources:

- [examples]https://github.com/IteraLabs/atelier-rs/tree/main/atelier-data/examples
- [tests]https://github.com/IteraLabs/atelier-rs/tree/main/atelier-data/tests
- [benches]https://github.com/IteraLabs/atelier-rs/tree/main/benches
- [datasets]https://github.com/IteraLabs/atelier-rs/tree/main/datasets