Expand description
Enhanced Statistics.db parser for Cassandra 5.0 ‘nb’ format
§Implementation Status (Issue #162)
This module provides MINIMAL PARSING of nb-format Statistics.db files to support delta-coded timestamp decoding in V5CompressedLegacy parser.
§Current Implementation
Parses ONLY the EncodingStats fields required for delta decoding:
- Header (32 bytes): version, data_length, checksum, metadata
- EncodingStats section: partitioner, minTimestamp, minLocalDeletionTime, minTTL
All other statistics (row counts, histograms, column stats, etc.) are populated with placeholder values. This is sufficient for V5CompressedLegacy parser baseline values.
§Previous Implementation (REMOVED)
The previous implementation violated the no-heuristics mandate (Issue #28) by fabricating statistics from header metadata. It was removed and replaced with this minimal real-data parser that extracts only what’s needed from the actual binary format.
§Deferred to Future Milestones
Complete Statistics.db parsing including:
- Row count statistics and distribution histograms
- Column-level statistics and cardinality estimates
- Partition size histograms and percentiles
- Compression ratio and performance metrics
- Checksum validation (header.checksum field not yet validated)
§References
- Issue #162: Fix Statistics reader for Cassandra 5 nb format
- Issue #28: No-heuristics mandate for modern Cassandra 5.0 paths
- Issue #105: Remove heuristic estimation from enhanced_statistics_parser.rs
docs/development/rust_developer_guide.md: Architecture decisions
Functions§
- parse_
enhanced_ statistics_ file - Main enhanced parser for real Statistics.db files (minimal implementation for Issue #162)
- parse_
nb_ format_ header - Enhanced Statistics.db header parser for real ‘nb’ format
- parse_
nb_ format_ statistics_ data - Parse minimal nb-format statistics data for delta-coding baseline (Issue #162)
- parse_
statistics_ with_ fallback - Enhanced statistics reader with fallback (minimal implementation for Issue #162)