Skip to main content

SSTableReader

Struct SSTableReader 

Source
pub struct SSTableReader {
    pub generation: u64,
    pub compression_info: Option<Arc<CompressionInfo>>,
    /* private fields */
}
Expand description

SSTable reader for efficient data access

Fields§

§generation: u64

SSTable generation number (for multi-generation merging)

§compression_info: Option<Arc<CompressionInfo>>

CompressionInfo metadata for chunked decompression (if compressed)

Implementations§

Source§

impl SSTableReader

Source

pub fn get_cache_stats(&self) -> (u64, u64, f64)

Get cache statistics for reporting

Source§

impl SSTableReader

Source

pub async fn get( &self, table_id: &TableId, key: &RowKey, ) -> Result<Option<Value>>

Get a value by key from the SSTable

Source

pub async fn scan( &self, table_id: &TableId, start_key: Option<&RowKey>, end_key: Option<&RowKey>, limit: Option<usize>, schema: Option<&TableSchema>, ) -> Result<Vec<(RowKey, Value)>>

Scan a range of keys

§Arguments
  • table_id - The table to scan
  • start_key - Optional start key for range scan
  • end_key - Optional end key for range scan
  • limit - Optional limit on number of results
  • schema - Optional table schema for schema-aware parsing. When provided, enables accurate type detection and avoids heuristic-based parsing. Strongly recommended for Cassandra 5.0+ formats.
Source

pub async fn get_all_entries(&self) -> Result<Vec<(TableId, RowKey, Value)>>

Get all entries in the SSTable.

§Tombstone contract (Issue #505)

This is a user-facing accessor: row tombstones are filtered out via Self::filter_tombstone and never appear in the returned entries. The underlying parse_block path emits Value::Tombstone(RowTombstone) for deleted rows, but those are suppressed here so callers see exactly the live rows (matching the previous Value::Null suppression behaviour).

The compaction k-way merger must instead use Self::iterate_all_partitions_for_compaction, which preserves Value::Tombstone entries (with their authoritative deletion timestamps) so that tombstone-shadowing semantics can be applied during the merge.

Source

pub async fn iterate_all_partitions_for_compaction( &self, schema: Option<&TableSchema>, ) -> Result<Vec<(RowKey, Value, i64)>>

Iterate all partitions with per-row timestamps, for use by the compaction merger.

Returns (RowKey, Value, row_timestamp_micros) for every row in the SSTable. Unlike [iterate_all_partitions]:

  • Row tombstones are returned as Value::Tombstone(RowTombstone) carrying the actual deletion timestamp extracted from the on-disk row header.
  • Cell tombstones within live rows are stored as Value::Tombstone(CellTombstone) inside the Value::Map, also carrying the actual cell-level deletion timestamp.
  • The third tuple element is the decoded row-level write timestamp, so the merger can perform timestamp-accurate last-write-wins comparisons.

Normal user-facing reads use [scan] / [get] / [iterate_all_partitions], which apply tombstone filtering. Do NOT use this method for user-visible queries.

(Issue #505)

Source

pub async fn read_value_at_offset( &self, offset: u64, size: u32, ) -> Result<Option<Value>>

Read value at a specific offset with caching

Source§

impl SSTableReader

Source

pub async fn get_health_metrics(&self) -> Result<SSTableReaderHealthMetrics>

Get comprehensive reader health and performance metrics

Source

pub async fn perform_integrity_check(&self) -> Result<IntegrityCheckResult>

Perform integrity check on the SSTable file

Source§

impl SSTableReader

Source

pub async fn lookup_partition_with_index( &self, partition_key: &[u8], ) -> Result<Option<(u64, u32)>>

Enhanced partition lookup using Index.db reader with promoted index support.

partition_key must be the raw partition-key bytes as produced by PartitionKey::to_bytes:

  • Single-component keys — raw value bytes (UUID = 16 bytes, int = 4 BE bytes, etc.).
  • Multi-component (composite) keys[len: u16 BE][value bytes][0x00] per component, including a trailing 0x00 after the final component.

The Index.db key_lookup map is keyed on these exact raw bytes (set when the BIG-format parser was fixed in Issue #552). The old digest-based path (which caused every lookup to miss) has been removed. On a miss the function returns Ok(None) so callers can fall through to their existing sequential-scan fallback.

Source

pub async fn lookup_partition_with_schema_context( &self, partition_key: &[u8], parsing_context: &ParsingContext, ) -> Result<Option<(u64, u32)>>

Enhanced partition lookup using schema-driven key digest computation

Source

pub async fn iterate_all_partitions(&self) -> Result<Vec<(RowKey, Value)>>

Enhanced partition iteration using Summary.db reader

Note: Token-based range queries are not directly supported because Summary.db does not store token values (Issue #218). Instead, this iterates all summary entries and returns all partition data.

For token-based filtering, compute tokens from partition keys after retrieval.

§Issue #500: Sequential-scan fallback for writer-produced SSTables

The Summary.db → Index.db → Data.db lookup path depends on Index.db format compatibility between writer and reader (digest format vs. raw-key format). Locally written SSTables emit raw-key Index.db entries that the reader’s digest-based parser cannot resolve, so the lookup loop returns 0 entries even though Summary.db enumerates the partitions correctly.

When that happens we fall back to sequential_scan, which walks Data.db directly. For V5CompressedLegacy NB SSTables (the format the writer emits), sequential_scan uses the chunk-stitching path and returns every partition.

Source

pub async fn iterate_token_range( &self, _start_token: i64, _end_token: i64, ) -> Result<Vec<(RowKey, Value)>>

👎Deprecated since 0.1.0:

Summary.db does not store tokens. Use iterate_all_partitions() and filter by computed tokens.

Token range iteration (deprecated - tokens not stored in Summary.db)

This method is kept for API compatibility but simply delegates to iterate_all_partitions() since Summary.db does not store token values. Token filtering should be done by the caller after retrieval.

Source

pub async fn get_timestamp_range(&self) -> Result<Option<(i64, i64)>>

Get min/max timestamps from Statistics.db reader

Source

pub async fn get_token_coverage(&self) -> Result<Option<(i64, i64)>>

👎Deprecated since 0.1.0:

Summary.db does not store tokens. Compute tokens from partition keys using the partitioner.

Get token coverage (deprecated - tokens not stored in Summary.db)

Note: As of Issue #218, Summary.db does not store token values. This method now returns None since token coverage cannot be determined from Summary.db alone. Token computation requires partition keys and the partitioner algorithm.

Source

pub async fn get_with_spec_readers( &self, table_id: &TableId, key: &RowKey, ) -> Result<Option<Value>>

Enhanced get method using spec readers for efficient lookup

Source

pub async fn get_with_schema_context( &self, table_id: &TableId, key: &RowKey, parsing_context: &ParsingContext, ) -> Result<Option<Value>>

Enhanced get method using spec readers with schema-driven key digest computation

Source§

impl SSTableReader

Source

pub async fn open( path: &Path, config: &Config, platform: Arc<Platform>, ) -> Result<Self>

Open an SSTable file for reading

Source

pub fn set_schema_registry( &mut self, schema_registry: Arc<RwLock<SchemaRegistry>>, )

Set the schema registry for schema-driven operations

Source

pub fn set_udt_registry(&mut self, registry: UdtRegistry)

Set the UDT registry for UDT-aware parsing in collections

This enables proper parsing of UDTs inside collections (List, Set, Map) by providing the UDT field definitions needed for nested type resolution.

Source

pub async fn stats(&self) -> Result<&SSTableReaderStats>

Get reader statistics

Source

pub async fn close(self) -> Result<()>

Close the reader and release resources

Source

pub fn calculate_header_size(&self) -> usize

Calculate header size based on format and actual header content

Source

pub fn cassandra_version(&self) -> CassandraVersion

Get the Cassandra version from the SSTable header

Source

pub fn format_version(&self) -> Result<String>

Get the SSTable format version string

Source

pub fn header(&self) -> &SSTableHeader

Get a reference to the SSTable header

Source

pub fn schema(&self) -> Option<&TableSchema>

Get the table schema extracted from the SSTable header

Returns None for legacy formats or if schema extraction failed.

Source

pub fn extract_write_time_from_entry(&self, _key: &RowKey, value: &Value) -> i64

Extract write time from entry metadata

Trait Implementations§

Source§

impl Debug for SSTableReader

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.