π Oak CSV Parser
High-Performance Tabular Data Processing β A high-performance, incremental CSV parser built on the Oak framework. Optimized for large-scale data analysis, real-time streaming, and robust handling of tabular data formats.
π― Project Vision
Comma-Separated Values (CSV) are the universal language of data exchange, used across every industry from finance to scientific research. oak-csv provides a modern, Rust-powered infrastructure for analyzing and processing tabular data with extreme efficiency. By utilizing Oak's incremental parsing capabilities, it enables the creation of highly responsive data tools that can handle gigabyte-scale files with sub-millisecond updates. Whether you are building data validation engines, automated ETL pipelines, or sophisticated spreadsheet-like editors, oak-csv provides the robust, efficient foundation you need for high-fidelity data extraction.
β¨ Core Features
- β‘ Blazing Fast: Leverages Rust's zero-cost abstractions to parse massive CSV datasets with maximum throughput and minimal memory overhead.
- π Incremental by Nature: Built-in support for partial updatesβre-parse only modified rows or columns. Ideal for real-time data monitoring and large-scale dataset editing.
- π³ High-Fidelity AST: Generates a clean and easy-to-traverse Abstract Syntax Tree capturing:
- Rows & Records: Precise mapping of records, including support for complex quoting and escaping rules.
- Headers & Fields: Automatic identification of header rows and field-level metadata.
- Delimiters: Robust handling of various delimiters (comma, tab, semicolon) and line endings.
- π‘οΈ Industrial-Grade Fault Tolerance: Engineered to handle malformed data gracefully, providing precise diagnostics for unclosed quotes, mismatched column counts, or invalid encoding.
- π§© Deep Ecosystem Integration: Seamlessly works with
oak-lspfor full LSP support andoak-mcpfor intelligent data discovery and structured analysis.
ποΈ Architecture
The parser follows the Green/Red Tree architecture (inspired by Roslyn), which allows for:
- Efficient Immutability: Share nodes across different versions of the tree without copying.
- Lossless Syntax Trees: Retains all trivia (whitespace and comments), enabling faithful code formatting and refactoring.
- Type Safety: Strongly-typed "Red" nodes provide a convenient and safe API for tree traversal and analysis.
π Getting Started
Check out the src/readme.md for a quick start guide and API usage examples. For information on our testing strategy and how to run tests, see tests/readme.md.
π€ Contributing
We welcome contributions of all kinds! If you find a bug, have a feature request, or want to contribute code, please check our issues or submit a pull request.