Expand description
§SSTable Binary Format Parser Module
This module provides parsing functionality for Apache Cassandra SSTable binary formats. It handles deserialization of binary data structures from SSTable files (Data.db, Index.db, Statistics.db, etc.) produced by Cassandra 5.0+.
§Architecture Overview
This is one of four parsing subsystems in cqlite-core:
| Module | Purpose |
|---|---|
cql/ | Full CQL text → AST parsing |
parser/ | SSTable binary format parsing (this module) |
schema/cql_parser.rs | CREATE TABLE → TableSchema |
query/parser.rs | Lightweight DML → ParsedQuery |
See docs/architecture/parser-overview.md for the complete architecture overview.
§Module Architecture
parser/ (SSTable Binary Format Parsing)
│
├── Core Binary Parsing
│ ├── vint.rs - Variable-length integer (VInt) encoding
│ ├── vint_fixed.rs - Fixed-size integer alternatives
│ └── header.rs - SSTable header parsing (magic numbers, version detection)
│
├── Statistics Parsing
│ ├── statistics.rs - Statistics.db basic format
│ └── enhanced_statistics_parser.rs - Statistics.db enhanced format (nb/oa)
│
├── CQL Type Deserialization
│ ├── types.rs - All CQL primitive types (int, text, uuid, etc.)
│ ├── complex_types.rs - Collections, UDTs, tuples, frozen types
│ └── optimized_complex_types.rs - M3 performance-optimized parsing
│
├── Performance Utilities
│ └── zero_copy_parser.rs - String interning, zero-copy buffers
│
└── High-Level Interface
└── binary.rs - SSTableParser facadeNote: Test modules (*_test.rs, *_tests.rs) and benchmarks (*_benchmarks.rs)
are omitted from the diagram. See feature flag benchmarks for performance testing.
§Key Distinction: parser/ vs cql/
| Module | Purpose | Input | Output |
|---|---|---|---|
| parser/ | SSTable binary parsing | Raw bytes from .db files | Structured Rust values |
| cql/ | CQL text parsing | Query strings (“SELECT…”) | Abstract Syntax Trees |
This module (parser/) handles binary deserialization:
- Reading bytes from SSTable files (Data.db, Statistics.db, etc.)
- Decoding VInt-encoded integers per Cassandra’s wire format
- Deserializing CQL values (int → i32, text → String, uuid → Uuid)
For CQL text parsing (CREATE TABLE, SELECT, etc.), see the crate::cql module.
§Sub-module Reference
§Variable-Length Integer Encoding
vint- VInt encoding/decoding per Cassandra specification, with corruption detectionvint_fixed- Fixed-size integer parsing for when VInt isn’t used
§SSTable Headers
header- SSTable header parsing with version detection (oa/nb/legacy formats)
§Statistics Files
statistics- Statistics.db parsing for row counts, timestamps, min/max metadataenhanced_statistics_parser- Enhanced Statistics.db format for Cassandra 5.0’s nb (nested btree) and oa (open addressing) formats
§CQL Type Deserialization
types- All 20+ CQL primitive types: int, bigint, text, blob, uuid, timestamp, date, time, inet, varint, decimal, duration, boolean, float, double, ascii, timeuuidcomplex_types- Collections (list, set, map), UDTs, tuples, with depth tracking for nested typesoptimized_complex_types- M3 milestone performance optimizations for complex types
§Performance
zero_copy_parser- Memory optimization utilities: string interning, zero-copy buffer management to stay under 128MB memory target
§High-Level Interface
binary-SSTableParserfacade providing unified access to parsing functionality
§Usage Examples
use cqlite_core::parser::{parse_vint, SSTableHeader, CqlType};
// Parse variable-length integer from raw bytes
let bytes = [0x8A, 0x01]; // VInt-encoded value
let (remaining, value) = parse_vint(&bytes)?;
// Parse SSTable header to detect format version
let header = SSTableHeader::parse(&file_bytes)?;
println!("SSTable format: {:?}", header.format_type);§Backward Compatibility
The parse_cql_schema function is maintained for backward compatibility with
existing code. New code should use crate::cql::parse_cql_schema_enhanced
which provides better error handling and configuration options.
§Related Documentation
- SSTable format specification:
docs/sstables-definitive-guide/ - Known limitations:
docs/sstables-definitive-guide/chapters/appendix-f-known-limitations.md
Re-exports§
pub use binary::CQLiteParseError;pub use binary::ParseResult;pub use binary::SSTableParser;pub use crate::error::Result as CqlResult;pub use complex_types::*;pub use enhanced_statistics_parser::*;pub use header::*;pub use statistics::*;pub use types::*;pub use vint::*;
Modules§
- binary
- Binary format parsing for backward compatibility
- complex_
types - Complex Type Parsing for Cassandra Data Types
- enhanced_
statistics_ parser - Enhanced Statistics.db parser for Cassandra 5.0 ‘nb’ format
- header
- SSTable header parsing for Cassandra 5+ ‘oa’ format
- optimized_
complex_ types - High-performance optimized complex type parsing for M3
- statistics
- Statistics.db parser for Cassandra 5+ SSTable format
- types
- CQL type system parsing and serialization
- vint
- Variable-length integer encoding/decoding for Cassandra SSTable format
- vint_
fixed - Fixed VInt implementation for Cassandra compatibility
- zero_
copy_ parser - Zero-copy parsing optimizations
Functions§
- parse_
cql_ schema Deprecated - Parse CQL CREATE TABLE statement (backward compatibility function)