# tickparse
Blazing-fast streaming CSV parser for financial tick data. Built for backtesting engines that need to chew through gigabytes of market data without breaking a sweat.
```
327 million rows · 12 GB · 34 seconds · 5 MB RAM
```
## Why
Loading a 12GB tick data CSV should not take minutes or eat all your RAM. `tickparse` memory-maps the file and parses raw bytes in parallel across all CPU cores — no serde, no UTF-8 validation, no heap allocation per row.
## Benchmarks
Tested on 12GB XAUUSD tick data (327M rows), Apple Silicon(M1 base):
| V1 — BufReader | 49.08s | 0.25 GB/s | 6.7M | ~7 MB |
| V2 — mmap | 64.64s | 0.19 GB/s | 5.1M | ~0 MB |
| V3 — parallel → Vec | 27s | 0.45 GB/s | 12.1M | 900 MB |
| **Stream — parallel + callback** | **34.25s** | **0.36 GB/s** | **9.5M** | **5 MB** |
Run benchmarks yourself:
```bash
cargo run --release --bin bench
```
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
tickparse = "0.1"
```
Or install the CLI:
```bash
cargo install tickparse
```
## CSV Format
Headerless, 3 columns: `timestamp,bid,ask`
```
2020.01.02 01:00:04.735,1518.77,1519.59
2020.01.02 01:00:04.836,1518.86,1519.74
```
## Usage
### Streaming (recommended for backtesting)
Zero-allocation parallel processing — each tick is parsed and passed to your callback across all CPU cores:
```rust
use tickparse::parallel::parse_parallel_streaming;
use std::path::Path;
// tick.bid — bid price (f64)
// tick.ask — ask price (f64)
engine.on_tick(tick);
}).unwrap();
```
### Iterator (single-threaded, zero-alloc)
Sequential access without storing all ticks:
```rust
use tickparse::mmap_parser::MmapTickIterator;
use std::path::Path;
for tick in MmapTickIterator::new(Path::new("data.csv")).unwrap() {
println!("{} bid={} ask={}", tick.timestamp_us, tick.bid, tick.ask);
}
```
### Load into Vec
When you need random access to all ticks:
```rust
use tickparse::parallel::parse_parallel;
use std::path::Path;
let ticks = parse_parallel(Path::new("data.csv")).unwrap();
println!("Loaded {} ticks", ticks.len());
```
### CLI
```bash
# Benchmark all parser modes on your data
tickparse --file data.csv --version all
# Run only the streaming parser
tickparse --file data.csv --version stream
```
## How It Works
1. **Memory-map** the file — the OS handles paging, no userspace buffering
2. **Split** into N chunks (one per CPU core), aligned to newline boundaries
3. **Parse** each chunk in parallel using hand-rolled byte-level parsers:
- Timestamps parsed digit-by-digit, converted to epoch microseconds
- Floats parsed without `str::parse` or serde — direct byte arithmetic
4. **Stream** results via callback — no `Vec<Tick>` materialization needed
## Project Structure
```
src/
├── lib.rs # Tick struct, zero-alloc byte parsers
├── main.rs # CLI benchmark runner
├── naive.rs # V1: BufReader baseline
├── mmap_parser.rs # V2: mmap + streaming iterator
└── parallel.rs # V3: parallel chunks + streaming callback
bench/
└── bench.rs # Automated benchmark suite
examples/
└── example.rs # Usage patterns for backtesting integration
```
## License
BSD 3-Clause — see [LICENSE](LICENSE)