qs-data-preprocess 0.1.0

Historical market data storage and preprocessing CLI
Documentation

data-preprocess

Historical market data storage and preprocessing CLI. Imports tick and OHLCV bar data from CSV files into a local DuckDB database, with support for multiple exchanges, deduplication, querying, and management.

Features

  • Import tick (bid/ask/last) and bar (OHLCV) data from tab-delimited CSV files
  • DuckDB embedded storage — single-file, no server required
  • Exchange-partitioned data — same symbol on different exchanges stored independently
  • Automatic deduplication on import (idempotent re-imports)
  • Timezone conversion from source offset to UTC
  • Symbol auto-extraction from filenames
  • Query and display stored data with filtering, pagination, and sort options
  • Delete by exchange, symbol, timeframe, or date range

Quick Start

# Build
cargo build -p qs-data-preprocess

# Import tick data (symbol extracted from filename, UTC+2 default)
data-preprocess input tick --exchange ctrader BTCUSD_202602161900_202602210954.csv

# Import bar data (timeframe required)
data-preprocess input bar --exchange ctrader --timeframe 1m BTCUSD_M1_202602210045_202602211009.csv

# View statistics
data-preprocess stats

# Query ticks
data-preprocess view tick --exchange ctrader --symbol BTCUSD --limit 20 --tail

# Query bars
data-preprocess view bar --exchange ctrader --symbol BTCUSD --timeframe 1m --limit 20

# Remove data
data-preprocess remove tick --exchange ctrader --symbol BTCUSD --from 2026-02-16 --to 2026-02-18
data-preprocess remove symbol --exchange ctrader BTCUSD
data-preprocess remove exchange binance

# Run tests
cargo test -p qs-data-preprocess

CLI Reference

data-preprocess [--db <PATH>] <COMMAND>

Global options:
  --db <PATH>    Path to DuckDB file [default: market_data.duckdb]
                 Also reads DATA_PREPROCESS_DB env var

Commands:
  input          Import market data from CSV file(s)
  remove         Remove data by exchange / symbol / type / date range
  stats          Show summary statistics
  view           Query and display stored data

input tick

data-preprocess input tick [OPTIONS] <FILES>...

  -e, --exchange <EX>      Exchange name (REQUIRED)
      --symbol <SYM>       Override symbol (default: from filename)
      --tz-offset <TZ>     Source timezone offset [default: +02:00]

input bar

data-preprocess input bar [OPTIONS] <FILES>...

  -e, --exchange <EX>      Exchange name (REQUIRED)
  -t, --timeframe <TF>     Timeframe: 1m, 5m, 15m, 30m, 1h, 4h, 1d, 1w, 1M (REQUIRED)
      --symbol <SYM>       Override symbol (default: from filename)
      --tz-offset <TZ>     Source timezone offset [default: +02:00]

stats

data-preprocess stats [--exchange <EX>] [--symbol <SYM>]

view tick / view bar

data-preprocess view tick -e <EX> --symbol <SYM> [--from <DT>] [--to <DT>] [--limit N] [--tail] [--desc]
data-preprocess view bar  -e <EX> --symbol <SYM> -t <TF> [--from <DT>] [--to <DT>] [--limit N] [--tail] [--desc]

remove

data-preprocess remove tick     -e <EX> --symbol <SYM> [--from <DT>] [--to <DT>]
data-preprocess remove bar      -e <EX> --symbol <SYM> -t <TF> [--from <DT>] [--to <DT>]
data-preprocess remove symbol   -e <EX> <SYMBOL>
data-preprocess remove exchange <EXCHANGE>

Input CSV Formats

Tick CSV

Tab-delimited, with header. Filename convention: {SYMBOL}_*.csv

<DATE>	<TIME>	<BID>	<ASK>	<LAST>	<VOLUME>	<FLAGS>
2026.02.16	19:00:00.083	67849.69	67861.69			6

Bar CSV

Tab-delimited, with header. Filename convention: {SYMBOL}_*.csv

<DATE>	<TIME>	<OPEN>	<HIGH>	<LOW>	<CLOSE>	<TICKVOL>	<VOL>	<SPREAD>
2026.02.21	00:45:00	67932.44	67934.19	67888.89	67910.24	184	0	1200

Data Conventions

  • Exchanges are always stored lowercase (ctrader, binance)
  • Symbols are always stored uppercase (BTCUSD, EURUSD)
  • Timestamps are stored in UTC — source timezone is converted on import
  • Deduplication uses INSERT OR IGNORE on (exchange, symbol, ts) for ticks and (exchange, symbol, timeframe, ts) for bars

For full schema details, query examples, and client integration guides (Python, Rust), see db-details.md.

Library Usage

The crate also exposes a library for programmatic access:

use data_preprocess::{Database, db::QueryOpts};

let db = Database::open("market_data.duckdb".as_ref())?;
let (ticks, total) = db.query_ticks(&QueryOpts {
    exchange: "ctrader".into(),
    symbol: "BTCUSD".into(),
    from: None,
    to: None,
    limit: 1000,
    tail: false,
    descending: false,
})?;

License

Licensed under either of