tickparse 0.1.0

Blazing-fast streaming CSV parser for financial tick data — mmap, zero-copy, parallel
Documentation
# tickparse

Blazing-fast streaming CSV parser for financial tick data. Built for backtesting engines that need to chew through gigabytes of market data without breaking a sweat.

```
327 million rows · 12 GB · 34 seconds · 5 MB RAM
```

## Why

Loading a 12GB tick data CSV should not take minutes or eat all your RAM. `tickparse` memory-maps the file and parses raw bytes in parallel across all CPU cores — no serde, no UTF-8 validation, no heap allocation per row.

## Benchmarks

Tested on 12GB XAUUSD tick data (327M rows), Apple Silicon(M1 base):

| Parser | Time | Throughput | Rows/sec | Memory |
|--------|------|-----------|----------|--------|
| V1 — BufReader | 49.08s | 0.25 GB/s | 6.7M | ~7 MB |
| V2 — mmap | 64.64s | 0.19 GB/s | 5.1M | ~0 MB |
| V3 — parallel → Vec | 27s | 0.45 GB/s | 12.1M | 900 MB |
| **Stream — parallel + callback** | **34.25s** | **0.36 GB/s** | **9.5M** | **5 MB** |

Run benchmarks yourself:

```bash
cargo run --release --bin bench
```

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
tickparse = "0.1"
```

Or install the CLI:

```bash
cargo install tickparse
```

## CSV Format

Headerless, 3 columns: `timestamp,bid,ask`

```
2020.01.02 01:00:04.735,1518.77,1519.59
2020.01.02 01:00:04.836,1518.86,1519.74
```

## Usage

### Streaming (recommended for backtesting)

Zero-allocation parallel processing — each tick is parsed and passed to your callback across all CPU cores:

```rust
use tickparse::parallel::parse_parallel_streaming;
use std::path::Path;

parse_parallel_streaming(Path::new("data.csv"), |tick| {
    // tick.timestamp_us — microseconds since Unix epoch
    // tick.bid          — bid price (f64)
    // tick.ask          — ask price (f64)
    engine.on_tick(tick);
}).unwrap();
```

### Iterator (single-threaded, zero-alloc)

Sequential access without storing all ticks:

```rust
use tickparse::mmap_parser::MmapTickIterator;
use std::path::Path;

for tick in MmapTickIterator::new(Path::new("data.csv")).unwrap() {
    println!("{} bid={} ask={}", tick.timestamp_us, tick.bid, tick.ask);
}
```

### Load into Vec

When you need random access to all ticks:

```rust
use tickparse::parallel::parse_parallel;
use std::path::Path;

let ticks = parse_parallel(Path::new("data.csv")).unwrap();
println!("Loaded {} ticks", ticks.len());
```

### CLI

```bash
# Benchmark all parser modes on your data
tickparse --file data.csv --version all

# Run only the streaming parser
tickparse --file data.csv --version stream
```

## How It Works

1. **Memory-map** the file — the OS handles paging, no userspace buffering
2. **Split** into N chunks (one per CPU core), aligned to newline boundaries
3. **Parse** each chunk in parallel using hand-rolled byte-level parsers:
   - Timestamps parsed digit-by-digit, converted to epoch microseconds
   - Floats parsed without `str::parse` or serde — direct byte arithmetic
4. **Stream** results via callback — no `Vec<Tick>` materialization needed

## Project Structure

```
src/
├── lib.rs          # Tick struct, zero-alloc byte parsers
├── main.rs         # CLI benchmark runner
├── naive.rs        # V1: BufReader baseline
├── mmap_parser.rs  # V2: mmap + streaming iterator
└── parallel.rs     # V3: parallel chunks + streaming callback
bench/
└── bench.rs        # Automated benchmark suite
examples/
└── example.rs      # Usage patterns for backtesting integration
```

## License

BSD 3-Clause — see [LICENSE](LICENSE)