Skip to main content

Module parser

Module parser 

Source
Expand description

§SSTable Binary Format Parser Module

This module provides parsing functionality for Apache Cassandra SSTable binary formats. It handles deserialization of binary data structures from SSTable files (Data.db, Index.db, Statistics.db, etc.) produced by Cassandra 5.0+.

§Architecture Overview

This is one of four parsing subsystems in cqlite-core:

ModulePurpose
cql/Full CQL text → AST parsing
parser/SSTable binary format parsing (this module)
schema/cql_parser.rsCREATE TABLE → TableSchema
query/parser.rsLightweight DML → ParsedQuery

See docs/architecture/parser-overview.md for the complete architecture overview.

§Module Architecture

parser/ (SSTable Binary Format Parsing)
│
├── Core Binary Parsing
│   ├── vint.rs              - Variable-length integer (VInt) encoding
│   ├── vint_fixed.rs        - Fixed-size integer alternatives
│   └── header.rs            - SSTable header parsing (magic numbers, version detection)
│
├── Statistics Parsing
│   ├── statistics.rs        - Statistics.db basic format
│   └── enhanced_statistics_parser.rs - Statistics.db enhanced format (nb/oa)
│
├── CQL Type Deserialization
│   ├── types.rs             - All CQL primitive types (int, text, uuid, etc.)
│   ├── complex_types.rs     - Collections, UDTs, tuples, frozen types
│   └── optimized_complex_types.rs - M3 performance-optimized parsing
│
├── Performance Utilities
│   └── zero_copy_parser.rs  - String interning, zero-copy buffers
│
└── High-Level Interface
    └── binary.rs            - SSTableParser facade

Note: Test modules (*_test.rs, *_tests.rs) and benchmarks (*_benchmarks.rs) are omitted from the diagram. See feature flag benchmarks for performance testing.

§Key Distinction: parser/ vs cql/

ModulePurposeInputOutput
parser/SSTable binary parsingRaw bytes from .db filesStructured Rust values
cql/CQL text parsingQuery strings (“SELECT…”)Abstract Syntax Trees

This module (parser/) handles binary deserialization:

  • Reading bytes from SSTable files (Data.db, Statistics.db, etc.)
  • Decoding VInt-encoded integers per Cassandra’s wire format
  • Deserializing CQL values (int → i32, text → String, uuid → Uuid)

For CQL text parsing (CREATE TABLE, SELECT, etc.), see the crate::cql module.

§Sub-module Reference

§Variable-Length Integer Encoding

  • vint - VInt encoding/decoding per Cassandra specification, with corruption detection
  • vint_fixed - Fixed-size integer parsing for when VInt isn’t used

§SSTable Headers

  • header - SSTable header parsing with version detection (oa/nb/legacy formats)

§Statistics Files

  • statistics - Statistics.db parsing for row counts, timestamps, min/max metadata
  • enhanced_statistics_parser - Enhanced Statistics.db format for Cassandra 5.0’s nb (nested btree) and oa (open addressing) formats

§CQL Type Deserialization

  • types - All 20+ CQL primitive types: int, bigint, text, blob, uuid, timestamp, date, time, inet, varint, decimal, duration, boolean, float, double, ascii, timeuuid
  • complex_types - Collections (list, set, map), UDTs, tuples, with depth tracking for nested types
  • optimized_complex_types - M3 milestone performance optimizations for complex types

§Performance

  • zero_copy_parser - Memory optimization utilities: string interning, zero-copy buffer management to stay under 128MB memory target

§High-Level Interface

  • binary - SSTableParser facade providing unified access to parsing functionality

§Usage Examples

use cqlite_core::parser::{parse_vint, SSTableHeader, CqlType};

// Parse variable-length integer from raw bytes
let bytes = [0x8A, 0x01]; // VInt-encoded value
let (remaining, value) = parse_vint(&bytes)?;

// Parse SSTable header to detect format version
let header = SSTableHeader::parse(&file_bytes)?;
println!("SSTable format: {:?}", header.format_type);

§Backward Compatibility

The parse_cql_schema function is maintained for backward compatibility with existing code. New code should use crate::cql::parse_cql_schema_enhanced which provides better error handling and configuration options.

  • SSTable format specification: docs/sstables-definitive-guide/
  • Known limitations: docs/sstables-definitive-guide/chapters/appendix-f-known-limitations.md

Re-exports§

pub use binary::CQLiteParseError;
pub use binary::ParseResult;
pub use binary::SSTableParser;
pub use crate::error::Result as CqlResult;
pub use complex_types::*;
pub use enhanced_statistics_parser::*;
pub use header::*;
pub use statistics::*;
pub use types::*;
pub use vint::*;

Modules§

binary
Binary format parsing for backward compatibility
complex_types
Complex Type Parsing for Cassandra Data Types
enhanced_statistics_parser
Enhanced Statistics.db parser for Cassandra 5.0 ‘nb’ format
header
SSTable header parsing for Cassandra 5+ ‘oa’ format
optimized_complex_types
High-performance optimized complex type parsing for M3
statistics
Statistics.db parser for Cassandra 5+ SSTable format
types
CQL type system parsing and serialization
vint
Variable-length integer encoding/decoding for Cassandra SSTable format
vint_fixed
Fixed VInt implementation for Cassandra compatibility
zero_copy_parser
Zero-copy parsing optimizations

Functions§

parse_cql_schemaDeprecated
Parse CQL CREATE TABLE statement (backward compatibility function)