Skip to main content

Module enhanced_statistics_parser

Module enhanced_statistics_parser 

Source
Expand description

Enhanced Statistics.db parser for Cassandra 5.0 ‘nb’ format

§Implementation Status (Issue #162)

This module provides MINIMAL PARSING of nb-format Statistics.db files to support delta-coded timestamp decoding in V5CompressedLegacy parser.

§Current Implementation

Parses ONLY the EncodingStats fields required for delta decoding:

  • Header (32 bytes): version, data_length, checksum, metadata
  • EncodingStats section: partitioner, minTimestamp, minLocalDeletionTime, minTTL

All other statistics (row counts, histograms, column stats, etc.) are populated with placeholder values. This is sufficient for V5CompressedLegacy parser baseline values.

§Previous Implementation (REMOVED)

The previous implementation violated the no-heuristics mandate (Issue #28) by fabricating statistics from header metadata. It was removed and replaced with this minimal real-data parser that extracts only what’s needed from the actual binary format.

§Deferred to Future Milestones

Complete Statistics.db parsing including:

  • Row count statistics and distribution histograms
  • Column-level statistics and cardinality estimates
  • Partition size histograms and percentiles
  • Compression ratio and performance metrics
  • Checksum validation (header.checksum field not yet validated)

§References

  • Issue #162: Fix Statistics reader for Cassandra 5 nb format
  • Issue #28: No-heuristics mandate for modern Cassandra 5.0 paths
  • Issue #105: Remove heuristic estimation from enhanced_statistics_parser.rs
  • docs/development/rust_developer_guide.md: Architecture decisions

Functions§

parse_enhanced_statistics_file
Main enhanced parser for real Statistics.db files (minimal implementation for Issue #162)
parse_nb_format_header
Enhanced Statistics.db header parser for real ‘nb’ format
parse_nb_format_statistics_data
Parse minimal nb-format statistics data for delta-coding baseline (Issue #162)
parse_statistics_with_fallback
Enhanced statistics reader with fallback (minimal implementation for Issue #162)