pub struct SSTableReader {
pub generation: u64,
pub compression_info: Option<Arc<CompressionInfo>>,
/* private fields */
}Expand description
SSTable reader for efficient data access
Fields§
§generation: u64SSTable generation number (for multi-generation merging)
compression_info: Option<Arc<CompressionInfo>>CompressionInfo metadata for chunked decompression (if compressed)
Implementations§
Source§impl SSTableReader
impl SSTableReader
Sourcepub fn get_cache_stats(&self) -> (u64, u64, f64)
pub fn get_cache_stats(&self) -> (u64, u64, f64)
Get cache statistics for reporting
Source§impl SSTableReader
impl SSTableReader
Sourcepub async fn get(
&self,
table_id: &TableId,
key: &RowKey,
) -> Result<Option<Value>>
pub async fn get( &self, table_id: &TableId, key: &RowKey, ) -> Result<Option<Value>>
Get a value by key from the SSTable
Sourcepub async fn scan(
&self,
table_id: &TableId,
start_key: Option<&RowKey>,
end_key: Option<&RowKey>,
limit: Option<usize>,
schema: Option<&TableSchema>,
) -> Result<Vec<(RowKey, Value)>>
pub async fn scan( &self, table_id: &TableId, start_key: Option<&RowKey>, end_key: Option<&RowKey>, limit: Option<usize>, schema: Option<&TableSchema>, ) -> Result<Vec<(RowKey, Value)>>
Scan a range of keys
§Arguments
table_id- The table to scanstart_key- Optional start key for range scanend_key- Optional end key for range scanlimit- Optional limit on number of resultsschema- Optional table schema for schema-aware parsing. When provided, enables accurate type detection and avoids heuristic-based parsing. Strongly recommended for Cassandra 5.0+ formats.
Sourcepub async fn get_all_entries(&self) -> Result<Vec<(TableId, RowKey, Value)>>
pub async fn get_all_entries(&self) -> Result<Vec<(TableId, RowKey, Value)>>
Get all entries in the SSTable.
§Tombstone contract (Issue #505)
This is a user-facing accessor: row tombstones are filtered out via
Self::filter_tombstone and never appear in the returned entries. The
underlying parse_block path emits Value::Tombstone(RowTombstone) for
deleted rows, but those are suppressed here so callers see exactly the live
rows (matching the previous Value::Null suppression behaviour).
The compaction k-way merger must instead use
Self::iterate_all_partitions_for_compaction, which preserves
Value::Tombstone entries (with their authoritative deletion timestamps)
so that tombstone-shadowing semantics can be applied during the merge.
Sourcepub async fn iterate_all_partitions_for_compaction(
&self,
schema: Option<&TableSchema>,
) -> Result<Vec<(RowKey, Value, i64)>>
pub async fn iterate_all_partitions_for_compaction( &self, schema: Option<&TableSchema>, ) -> Result<Vec<(RowKey, Value, i64)>>
Iterate all partitions with per-row timestamps, for use by the compaction merger.
Returns (RowKey, Value, row_timestamp_micros) for every row in the SSTable.
Unlike [iterate_all_partitions]:
- Row tombstones are returned as
Value::Tombstone(RowTombstone)carrying the actual deletion timestamp extracted from the on-disk row header. - Cell tombstones within live rows are stored as
Value::Tombstone(CellTombstone)inside theValue::Map, also carrying the actual cell-level deletion timestamp. - The third tuple element is the decoded row-level write timestamp, so the merger can perform timestamp-accurate last-write-wins comparisons.
Normal user-facing reads use [scan] / [get] / [iterate_all_partitions],
which apply tombstone filtering. Do NOT use this method for user-visible queries.
(Issue #505)
Source§impl SSTableReader
impl SSTableReader
Sourcepub async fn get_health_metrics(&self) -> Result<SSTableReaderHealthMetrics>
pub async fn get_health_metrics(&self) -> Result<SSTableReaderHealthMetrics>
Get comprehensive reader health and performance metrics
Sourcepub async fn perform_integrity_check(&self) -> Result<IntegrityCheckResult>
pub async fn perform_integrity_check(&self) -> Result<IntegrityCheckResult>
Perform integrity check on the SSTable file
Source§impl SSTableReader
impl SSTableReader
Sourcepub async fn lookup_partition_with_index(
&self,
partition_key: &[u8],
) -> Result<Option<(u64, u32)>>
pub async fn lookup_partition_with_index( &self, partition_key: &[u8], ) -> Result<Option<(u64, u32)>>
Enhanced partition lookup using Index.db reader with promoted index support.
partition_key must be the raw partition-key bytes as produced by
PartitionKey::to_bytes:
- Single-component keys — raw value bytes (UUID = 16 bytes, int = 4 BE bytes, etc.).
- Multi-component (composite) keys —
[len: u16 BE][value bytes][0x00]per component, including a trailing0x00after the final component.
The Index.db key_lookup map is keyed on these exact raw bytes (set when the BIG-format
parser was fixed in Issue #552). The old digest-based path (which caused every lookup
to miss) has been removed. On a miss the function returns Ok(None) so callers can
fall through to their existing sequential-scan fallback.
Sourcepub async fn lookup_partition_with_schema_context(
&self,
partition_key: &[u8],
parsing_context: &ParsingContext,
) -> Result<Option<(u64, u32)>>
pub async fn lookup_partition_with_schema_context( &self, partition_key: &[u8], parsing_context: &ParsingContext, ) -> Result<Option<(u64, u32)>>
Enhanced partition lookup using schema-driven key digest computation
Sourcepub async fn iterate_all_partitions(&self) -> Result<Vec<(RowKey, Value)>>
pub async fn iterate_all_partitions(&self) -> Result<Vec<(RowKey, Value)>>
Enhanced partition iteration using Summary.db reader
Note: Token-based range queries are not directly supported because Summary.db does not store token values (Issue #218). Instead, this iterates all summary entries and returns all partition data.
For token-based filtering, compute tokens from partition keys after retrieval.
§Issue #500: Sequential-scan fallback for writer-produced SSTables
The Summary.db → Index.db → Data.db lookup path depends on Index.db format compatibility between writer and reader (digest format vs. raw-key format). Locally written SSTables emit raw-key Index.db entries that the reader’s digest-based parser cannot resolve, so the lookup loop returns 0 entries even though Summary.db enumerates the partitions correctly.
When that happens we fall back to sequential_scan, which walks Data.db
directly. For V5CompressedLegacy NB SSTables (the format the writer emits),
sequential_scan uses the chunk-stitching path and returns every partition.
Sourcepub async fn iterate_token_range(
&self,
_start_token: i64,
_end_token: i64,
) -> Result<Vec<(RowKey, Value)>>
👎Deprecated since 0.1.0: Summary.db does not store tokens. Use iterate_all_partitions() and filter by computed tokens.
pub async fn iterate_token_range( &self, _start_token: i64, _end_token: i64, ) -> Result<Vec<(RowKey, Value)>>
Summary.db does not store tokens. Use iterate_all_partitions() and filter by computed tokens.
Token range iteration (deprecated - tokens not stored in Summary.db)
This method is kept for API compatibility but simply delegates to
iterate_all_partitions() since Summary.db does not store token values.
Token filtering should be done by the caller after retrieval.
Sourcepub async fn get_timestamp_range(&self) -> Result<Option<(i64, i64)>>
pub async fn get_timestamp_range(&self) -> Result<Option<(i64, i64)>>
Get min/max timestamps from Statistics.db reader
Sourcepub async fn get_token_coverage(&self) -> Result<Option<(i64, i64)>>
👎Deprecated since 0.1.0: Summary.db does not store tokens. Compute tokens from partition keys using the partitioner.
pub async fn get_token_coverage(&self) -> Result<Option<(i64, i64)>>
Summary.db does not store tokens. Compute tokens from partition keys using the partitioner.
Get token coverage (deprecated - tokens not stored in Summary.db)
Note: As of Issue #218, Summary.db does not store token values. This method now returns None since token coverage cannot be determined from Summary.db alone. Token computation requires partition keys and the partitioner algorithm.
Sourcepub async fn get_with_spec_readers(
&self,
table_id: &TableId,
key: &RowKey,
) -> Result<Option<Value>>
pub async fn get_with_spec_readers( &self, table_id: &TableId, key: &RowKey, ) -> Result<Option<Value>>
Enhanced get method using spec readers for efficient lookup
Sourcepub async fn get_with_schema_context(
&self,
table_id: &TableId,
key: &RowKey,
parsing_context: &ParsingContext,
) -> Result<Option<Value>>
pub async fn get_with_schema_context( &self, table_id: &TableId, key: &RowKey, parsing_context: &ParsingContext, ) -> Result<Option<Value>>
Enhanced get method using spec readers with schema-driven key digest computation
Source§impl SSTableReader
impl SSTableReader
Sourcepub async fn open(
path: &Path,
config: &Config,
platform: Arc<Platform>,
) -> Result<Self>
pub async fn open( path: &Path, config: &Config, platform: Arc<Platform>, ) -> Result<Self>
Open an SSTable file for reading
Sourcepub fn set_schema_registry(
&mut self,
schema_registry: Arc<RwLock<SchemaRegistry>>,
)
pub fn set_schema_registry( &mut self, schema_registry: Arc<RwLock<SchemaRegistry>>, )
Set the schema registry for schema-driven operations
Sourcepub fn set_udt_registry(&mut self, registry: UdtRegistry)
pub fn set_udt_registry(&mut self, registry: UdtRegistry)
Set the UDT registry for UDT-aware parsing in collections
This enables proper parsing of UDTs inside collections (List, Set, Map) by providing the UDT field definitions needed for nested type resolution.
Sourcepub async fn stats(&self) -> Result<&SSTableReaderStats>
pub async fn stats(&self) -> Result<&SSTableReaderStats>
Get reader statistics
Sourcepub fn calculate_header_size(&self) -> usize
pub fn calculate_header_size(&self) -> usize
Calculate header size based on format and actual header content
Sourcepub fn cassandra_version(&self) -> CassandraVersion
pub fn cassandra_version(&self) -> CassandraVersion
Get the Cassandra version from the SSTable header
Sourcepub fn format_version(&self) -> Result<String>
pub fn format_version(&self) -> Result<String>
Get the SSTable format version string
Sourcepub fn header(&self) -> &SSTableHeader
pub fn header(&self) -> &SSTableHeader
Get a reference to the SSTable header
Sourcepub fn schema(&self) -> Option<&TableSchema>
pub fn schema(&self) -> Option<&TableSchema>
Get the table schema extracted from the SSTable header
Returns None for legacy formats or if schema extraction failed.
Sourcepub fn extract_write_time_from_entry(&self, _key: &RowKey, value: &Value) -> i64
pub fn extract_write_time_from_entry(&self, _key: &RowKey, value: &Value) -> i64
Extract write time from entry metadata