cqlite_core/parser/mod.rs
1//! # SSTable Binary Format Parser Module
2//!
3//! This module provides parsing functionality for Apache Cassandra SSTable binary formats.
4//! It handles deserialization of binary data structures from SSTable files (Data.db,
5//! Index.db, Statistics.db, etc.) produced by Cassandra 5.0+.
6//!
7//! ## Architecture Overview
8//!
9//! This is one of four parsing subsystems in cqlite-core:
10//!
11//! | Module | Purpose |
12//! |--------|---------|
13//! | `cql/` | Full CQL text → AST parsing |
14//! | **`parser/`** | SSTable binary format parsing (this module) |
15//! | `schema/cql_parser.rs` | CREATE TABLE → TableSchema |
16//! | `query/parser.rs` | Lightweight DML → ParsedQuery |
17//!
18//! See `docs/architecture/parser-overview.md` for the complete architecture overview.
19//!
20//! ## Module Architecture
21//!
22//! ```text
23//! parser/ (SSTable Binary Format Parsing)
24//! │
25//! ├── Core Binary Parsing
26//! │ ├── vint.rs - Variable-length integer (VInt) encoding
27//! │ ├── vint_fixed.rs - Fixed-size integer alternatives
28//! │ └── header.rs - SSTable header parsing (magic numbers, version detection)
29//! │
30//! ├── Statistics Parsing
31//! │ ├── statistics.rs - Statistics.db basic format
32//! │ └── enhanced_statistics_parser.rs - Statistics.db enhanced format (nb/oa)
33//! │
34//! ├── CQL Type Deserialization
35//! │ ├── types.rs - All CQL primitive types (int, text, uuid, etc.)
36//! │ ├── complex_types.rs - Collections, UDTs, tuples, frozen types
37//! │ └── optimized_complex_types.rs - M3 performance-optimized parsing
38//! │
39//! ├── Performance Utilities
40//! │ └── zero_copy_parser.rs - String interning, zero-copy buffers
41//! │
42//! └── High-Level Interface
43//! └── binary.rs - SSTableParser facade
44//! ```
45//!
46//! Note: Test modules (`*_test.rs`, `*_tests.rs`) and benchmarks (`*_benchmarks.rs`)
47//! are omitted from the diagram. See feature flag `benchmarks` for performance testing.
48//!
49//! ## Key Distinction: parser/ vs cql/
50//!
51//! | Module | Purpose | Input | Output |
52//! |--------|---------|-------|--------|
53//! | **parser/** | SSTable binary parsing | Raw bytes from .db files | Structured Rust values |
54//! | **cql/** | CQL text parsing | Query strings ("SELECT...") | Abstract Syntax Trees |
55//!
56//! This module (`parser/`) handles **binary deserialization**:
57//! - Reading bytes from SSTable files (Data.db, Statistics.db, etc.)
58//! - Decoding VInt-encoded integers per Cassandra's wire format
59//! - Deserializing CQL values (int → i32, text → String, uuid → Uuid)
60//!
61//! For **CQL text parsing** (CREATE TABLE, SELECT, etc.), see the [`crate::cql`] module.
62//!
63//! ## Sub-module Reference
64//!
65//! ### Variable-Length Integer Encoding
66//! - [`vint`] - VInt encoding/decoding per Cassandra specification, with corruption detection
67//! - [`vint_fixed`] - Fixed-size integer parsing for when VInt isn't used
68//!
69//! ### SSTable Headers
70//! - [`header`] - SSTable header parsing with version detection (oa/nb/legacy formats)
71//!
72//! ### Statistics Files
73//! - [`statistics`] - Statistics.db parsing for row counts, timestamps, min/max metadata
74//! - [`enhanced_statistics_parser`] - Enhanced Statistics.db format for Cassandra 5.0's
75//! nb (nested btree) and oa (open addressing) formats
76//!
77//! ### CQL Type Deserialization
78//! - [`types`] - All 20+ CQL primitive types: int, bigint, text, blob, uuid, timestamp,
79//! date, time, inet, varint, decimal, duration, boolean, float, double, ascii, timeuuid
80//! - [`complex_types`] - Collections (list, set, map), UDTs, tuples, with depth tracking
81//! for nested types
82//! - [`optimized_complex_types`] - M3 milestone performance optimizations for complex types
83//!
84//! ### Performance
85//! - [`zero_copy_parser`] - Memory optimization utilities: string interning, zero-copy
86//! buffer management to stay under 128MB memory target
87//!
88//! ### High-Level Interface
89//! - [`binary`] - `SSTableParser` facade providing unified access to parsing functionality
90//!
91//! ## Usage Examples
92//!
93//! ```rust,ignore
94//! use cqlite_core::parser::{parse_vint, SSTableHeader, CqlType};
95//!
96//! // Parse variable-length integer from raw bytes
97//! let bytes = [0x8A, 0x01]; // VInt-encoded value
98//! let (remaining, value) = parse_vint(&bytes)?;
99//!
100//! // Parse SSTable header to detect format version
101//! let header = SSTableHeader::parse(&file_bytes)?;
102//! println!("SSTable format: {:?}", header.format_type);
103//! ```
104//!
105//! ## Backward Compatibility
106//!
107//! The [`parse_cql_schema`] function is maintained for backward compatibility with
108//! existing code. **New code should use [`crate::cql::parse_cql_schema_enhanced`]**
109//! which provides better error handling and configuration options.
110//!
111//! ## Related Documentation
112//!
113//! - SSTable format specification: `docs/sstables-definitive-guide/`
114//! - Known limitations: `docs/sstables-definitive-guide/chapters/appendix-f-known-limitations.md`
115
116// Binary format parsing (SSTable components)
117pub mod binary;
118
119// Re-export existing modules for backward compatibility
120#[cfg(feature = "benchmarks")]
121pub mod benchmarks;
122pub mod collection_benchmarks;
123#[cfg(test)]
124pub mod collection_tests;
125// pub mod collection_udt_tests; // Commented out due to missing methods
126#[cfg(test)]
127pub mod collection_correctness_tests; // Property tests for Issue #61
128#[cfg(test)]
129pub mod collection_validation_tests;
130pub mod complex_types;
131pub mod enhanced_statistics_parser;
132#[cfg(test)]
133pub mod enhanced_statistics_test;
134pub mod header;
135pub mod statistics;
136#[cfg(test)]
137pub mod statistics_test;
138pub mod types;
139#[cfg(test)]
140pub mod udt_tests;
141pub mod vint;
142pub mod vint_fixed;
143
144// M3 Performance Optimization Modules
145pub mod optimized_complex_types;
146pub mod zero_copy_parser;
147
148// Re-export binary format parser
149pub use binary::{CQLiteParseError, ParseResult, SSTableParser};
150
151// Re-export binary format parsers for backward compatibility
152#[cfg(feature = "benchmarks")]
153pub use benchmarks::*;
154pub use complex_types::*;
155pub use enhanced_statistics_parser::*;
156pub use header::*;
157pub use statistics::*;
158pub use types::*;
159pub use vint::*;
160
161// Re-export M3 performance modules
162#[cfg(feature = "benchmarks")]
163pub use optimized_complex_types::OptimizedComplexTypeParser;
164
165/// Re-export common result types
166pub use crate::error::Result as CqlResult;
167
168/// Parse CQL CREATE TABLE statement (backward compatibility function)
169///
170/// **DEPRECATED**: This function maintains backward compatibility with existing code.
171/// For new code, use `cqlite_core::schema::parse_cql_schema()` which is synchronous
172/// and returns `Result<TableSchema>` instead of `nom::IResult`.
173///
174/// # Arguments
175/// * `input` - The CQL CREATE TABLE statement to parse
176///
177/// # Returns
178/// * `nom::IResult<&str, crate::schema::TableSchema>` - Parsed schema or error
179#[deprecated(
180 since = "0.2.0",
181 note = "Use cqlite_core::cql::parse_cql_schema_enhanced() instead for better error handling"
182)]
183pub fn parse_cql_schema(input: &str) -> nom::IResult<&str, crate::schema::TableSchema> {
184 // Delegate to the cql module (which now uses synchronous parsing)
185 #[allow(deprecated)]
186 crate::cql::schema_integration::parse_cql_schema_compat(input)
187}
188
189#[cfg(test)]
190mod tests {
191 use super::*;
192
193 #[test]
194 #[allow(deprecated)] // Testing deprecated API for backward compatibility
195 fn test_parse_cql_schema_backward_compat() {
196 // Test that the backward compatibility function still works
197 let schema = "CREATE TABLE test_keyspace.test_table (id int PRIMARY KEY)";
198 let result = parse_cql_schema(schema);
199
200 // The result should delegate to cql module and parse successfully
201 assert!(
202 result.is_ok(),
203 "Valid schema should parse successfully via backward-compat function"
204 );
205 }
206}