tickparse 0.1.0

Blazing-fast streaming CSV parser for financial tick data — mmap, zero-copy, parallel
Documentation

tickparse

Blazing-fast streaming CSV parser for financial tick data. Built for backtesting engines that need to chew through gigabytes of market data without breaking a sweat.

327 million rows · 12 GB · 34 seconds · 5 MB RAM

Why

Loading a 12GB tick data CSV should not take minutes or eat all your RAM. tickparse memory-maps the file and parses raw bytes in parallel across all CPU cores — no serde, no UTF-8 validation, no heap allocation per row.

Benchmarks

Tested on 12GB XAUUSD tick data (327M rows), Apple Silicon(M1 base):

Parser Time Throughput Rows/sec Memory
V1 — BufReader 49.08s 0.25 GB/s 6.7M ~7 MB
V2 — mmap 64.64s 0.19 GB/s 5.1M ~0 MB
V3 — parallel → Vec 27s 0.45 GB/s 12.1M 900 MB
Stream — parallel + callback 34.25s 0.36 GB/s 9.5M 5 MB

Run benchmarks yourself:

cargo run --release --bin bench

Installation

Add to your Cargo.toml:

[dependencies]
tickparse = "0.1"

Or install the CLI:

cargo install tickparse

CSV Format

Headerless, 3 columns: timestamp,bid,ask

2020.01.02 01:00:04.735,1518.77,1519.59
2020.01.02 01:00:04.836,1518.86,1519.74

Usage

Streaming (recommended for backtesting)

Zero-allocation parallel processing — each tick is parsed and passed to your callback across all CPU cores:

use tickparse::parallel::parse_parallel_streaming;
use std::path::Path;

parse_parallel_streaming(Path::new("data.csv"), |tick| {
    // tick.timestamp_us — microseconds since Unix epoch
    // tick.bid          — bid price (f64)
    // tick.ask          — ask price (f64)
    engine.on_tick(tick);
}).unwrap();

Iterator (single-threaded, zero-alloc)

Sequential access without storing all ticks:

use tickparse::mmap_parser::MmapTickIterator;
use std::path::Path;

for tick in MmapTickIterator::new(Path::new("data.csv")).unwrap() {
    println!("{} bid={} ask={}", tick.timestamp_us, tick.bid, tick.ask);
}

Load into Vec

When you need random access to all ticks:

use tickparse::parallel::parse_parallel;
use std::path::Path;

let ticks = parse_parallel(Path::new("data.csv")).unwrap();
println!("Loaded {} ticks", ticks.len());

CLI

# Benchmark all parser modes on your data
tickparse --file data.csv --version all

# Run only the streaming parser
tickparse --file data.csv --version stream

How It Works

  1. Memory-map the file — the OS handles paging, no userspace buffering
  2. Split into N chunks (one per CPU core), aligned to newline boundaries
  3. Parse each chunk in parallel using hand-rolled byte-level parsers:
    • Timestamps parsed digit-by-digit, converted to epoch microseconds
    • Floats parsed without str::parse or serde — direct byte arithmetic
  4. Stream results via callback — no Vec<Tick> materialization needed

Project Structure

src/
├── lib.rs          # Tick struct, zero-alloc byte parsers
├── main.rs         # CLI benchmark runner
├── naive.rs        # V1: BufReader baseline
├── mmap_parser.rs  # V2: mmap + streaming iterator
└── parallel.rs     # V3: parallel chunks + streaming callback
bench/
└── bench.rs        # Automated benchmark suite
examples/
└── example.rs      # Usage patterns for backtesting integration

License

BSD 3-Clause — see LICENSE