Struct SSTableReader

Source

pub struct SSTableReader {
    pub generation: u64,
    pub compression_info: Option<Arc<CompressionInfo>>,
    /* private fields */
}

Expand description

SSTable reader for efficient data access

Fields§

§generation: u64

SSTable generation number (for multi-generation merging)

§compression_info: Option<Arc<CompressionInfo>>

CompressionInfo metadata for chunked decompression (if compressed)

Implementations§

Source §

impl SSTableReader

Source

pub fn get_cache_stats(&self) -> (u64, u64, f64)

Get cache statistics for reporting

Source §

impl SSTableReader

Source

pub async fn get( &self, table_id: &TableId, key: &RowKey, ) -> Result<Option<Value>>

Get a value by key from the SSTable

Source

pub async fn scan( &self, table_id: &TableId, start_key: Option<&RowKey>, end_key: Option<&RowKey>, limit: Option<usize>, schema: Option<&TableSchema>, ) -> Result<Vec<(RowKey, Value)>>

Scan a range of keys

§Arguments

table_id - The table to scan
start_key - Optional start key for range scan
end_key - Optional end key for range scan
limit - Optional limit on number of results
schema - Optional table schema for schema-aware parsing. When provided, enables accurate type detection and avoids heuristic-based parsing. Strongly recommended for Cassandra 5.0+ formats.

Source

pub async fn get_all_entries(&self) -> Result<Vec<(TableId, RowKey, Value)>>

Get all entries in the SSTable.

§Tombstone contract (Issue #505)

This is a user-facing accessor: row tombstones are filtered out via Self::filter_tombstone and never appear in the returned entries. The underlying parse_block path emits Value::Tombstone(RowTombstone) for deleted rows, but those are suppressed here so callers see exactly the live rows (matching the previous Value::Null suppression behaviour).

The compaction k-way merger must instead use Self::iterate_all_partitions_for_compaction, which preserves Value::Tombstone entries (with their authoritative deletion timestamps) so that tombstone-shadowing semantics can be applied during the merge.

Source

pub async fn iterate_all_partitions_for_compaction( &self, schema: Option<&TableSchema>, ) -> Result<Vec<(RowKey, Value, i64)>>

Iterate all partitions with per-row timestamps, for use by the compaction merger.

Returns (RowKey, Value, row_timestamp_micros) for every row in the SSTable. Unlike [iterate_all_partitions]:

Row tombstones are returned as Value::Tombstone(RowTombstone) carrying the actual deletion timestamp extracted from the on-disk row header.
Cell tombstones within live rows are stored as Value::Tombstone(CellTombstone) inside the Value::Map, also carrying the actual cell-level deletion timestamp.
The third tuple element is the decoded row-level write timestamp, so the merger can perform timestamp-accurate last-write-wins comparisons.

Normal user-facing reads use [scan] / [get] / [iterate_all_partitions], which apply tombstone filtering. Do NOT use this method for user-visible queries.

(Issue #505)

Source

pub async fn read_value_at_offset( &self, offset: u64, size: u32, ) -> Result<Option<Value>>

Read value at a specific offset with caching

Source §

impl SSTableReader

Source

pub async fn get_health_metrics(&self) -> Result<SSTableReaderHealthMetrics>

Get comprehensive reader health and performance metrics

Source

pub async fn perform_integrity_check(&self) -> Result<IntegrityCheckResult>

Perform integrity check on the SSTable file

Source §

impl SSTableReader

Source

pub async fn lookup_partition_with_index( &self, partition_key: &[u8], ) -> Result<Option<(u64, u32)>>

Enhanced partition lookup using Index.db reader with promoted index support.

partition_key must be the raw partition-key bytes as produced by PartitionKey::to_bytes:

Single-component keys — raw value bytes (UUID = 16 bytes, int = 4 BE bytes, etc.).
Multi-component (composite) keys — [len: u16 BE][value bytes][0x00] per component, including a trailing 0x00 after the final component.

The Index.db key_lookup map is keyed on these exact raw bytes (set when the BIG-format parser was fixed in Issue #552). The old digest-based path (which caused every lookup to miss) has been removed. On a miss the function returns Ok(None) so callers can fall through to their existing sequential-scan fallback.

Source

pub async fn lookup_partition_with_schema_context( &self, partition_key: &[u8], parsing_context: &ParsingContext, ) -> Result<Option<(u64, u32)>>

Enhanced partition lookup using schema-driven key digest computation

Source

pub async fn iterate_all_partitions(&self) -> Result<Vec<(RowKey, Value)>>

Enhanced partition iteration using Summary.db reader

Note: Token-based range queries are not directly supported because Summary.db does not store token values (Issue #218). Instead, this iterates all summary entries and returns all partition data.

For token-based filtering, compute tokens from partition keys after retrieval.

§Issue #500: Sequential-scan fallback for writer-produced SSTables

The Summary.db → Index.db → Data.db lookup path depends on Index.db format compatibility between writer and reader (digest format vs. raw-key format). Locally written SSTables emit raw-key Index.db entries that the reader’s digest-based parser cannot resolve, so the lookup loop returns 0 entries even though Summary.db enumerates the partitions correctly.

When that happens we fall back to sequential_scan, which walks Data.db directly. For V5CompressedLegacy NB SSTables (the format the writer emits), sequential_scan uses the chunk-stitching path and returns every partition.

Source

pub async fn iterate_token_range( &self, _start_token: i64, _end_token: i64, ) -> Result<Vec<(RowKey, Value)>>

👎Deprecated since 0.1.0:

Summary.db does not store tokens. Use iterate_all_partitions() and filter by computed tokens.

Token range iteration (deprecated - tokens not stored in Summary.db)

This method is kept for API compatibility but simply delegates to iterate_all_partitions() since Summary.db does not store token values. Token filtering should be done by the caller after retrieval.

Source

pub async fn get_timestamp_range(&self) -> Result<Option<(i64, i64)>>

Get min/max timestamps from Statistics.db reader

Source

pub async fn get_token_coverage(&self) -> Result<Option<(i64, i64)>>

👎Deprecated since 0.1.0:

Summary.db does not store tokens. Compute tokens from partition keys using the partitioner.

Get token coverage (deprecated - tokens not stored in Summary.db)

Note: As of Issue #218, Summary.db does not store token values. This method now returns None since token coverage cannot be determined from Summary.db alone. Token computation requires partition keys and the partitioner algorithm.

Source