pub struct ColumnStore<P: Pager> { /* private fields */ }Implementations§
Source§impl<P> ColumnStore<P>
impl<P> ColumnStore<P>
pub fn open(pager: Arc<P>) -> Result<Self>
Sourcepub fn register_index(
&self,
field_id: LogicalFieldId,
kind: IndexKind,
) -> Result<()>
pub fn register_index( &self, field_id: LogicalFieldId, kind: IndexKind, ) -> Result<()>
Registers an index for a given column, building it for existing data atomically and with low memory usage.
Sourcepub fn unregister_index(
&self,
field_id: LogicalFieldId,
kind: IndexKind,
) -> Result<()>
pub fn unregister_index( &self, field_id: LogicalFieldId, kind: IndexKind, ) -> Result<()>
Unregisters a persisted index from a given column atomically.
Sourcepub fn data_type(&self, field_id: LogicalFieldId) -> Result<DataType>
pub fn data_type(&self, field_id: LogicalFieldId) -> Result<DataType>
Returns the Arrow data type of the given field, loading it if not cached.
Sourcepub fn filter_row_ids<T>(
&self,
field_id: LogicalFieldId,
predicate: &Predicate<T::Value>,
) -> Result<Vec<u64>>where
T: FilterDispatch,
pub fn filter_row_ids<T>(
&self,
field_id: LogicalFieldId,
predicate: &Predicate<T::Value>,
) -> Result<Vec<u64>>where
T: FilterDispatch,
Collects the row ids whose values satisfy the provided predicate.
pub fn filter_matches<T, F>( &self, field_id: LogicalFieldId, predicate: F, ) -> Result<FilterResult>
Sourcepub fn list_persisted_indexes(
&self,
field_id: LogicalFieldId,
) -> Result<Vec<IndexKind>>
pub fn list_persisted_indexes( &self, field_id: LogicalFieldId, ) -> Result<Vec<IndexKind>>
Lists the names of all persisted indexes for a given column.
Sourcepub fn total_rows_for_field(&self, field_id: LogicalFieldId) -> Result<u64>
pub fn total_rows_for_field(&self, field_id: LogicalFieldId) -> Result<u64>
Return the total number of rows recorded for the given logical field.
This reads the ColumnDescriptor.total_row_count stored in the catalog and descriptor pages and returns it as a u64. The value is updated by append / delete code paths and represents the persisted row count for that column.
Sourcepub fn total_rows_for_table(&self, table_id: TableId) -> Result<u64>
pub fn total_rows_for_table(&self, table_id: TableId) -> Result<u64>
Return the total number of rows for a table by inspecting any persisted
user-data column descriptor belonging to table_id.
This method picks the first user-data column found in the in-memory
catalog for the table and returns its descriptor.total_row_count. If the
table has no persisted columns, Ok(0) is returned.
Sourcepub fn user_field_ids_for_table(&self, table_id: TableId) -> Vec<LogicalFieldId>
pub fn user_field_ids_for_table(&self, table_id: TableId) -> Vec<LogicalFieldId>
Return the logical field identifiers for all user columns in table_id.
Sourcepub fn has_row_id(&self, field_id: LogicalFieldId, row_id: u64) -> Result<bool>
pub fn has_row_id(&self, field_id: LogicalFieldId, row_id: u64) -> Result<bool>
Fast presence check using the presence index (row-id permutation) if available.
Returns true if row_id exists in the column; false otherwise.
Sourcepub fn append(&self, batch: &RecordBatch) -> Result<()>
pub fn append(&self, batch: &RecordBatch) -> Result<()>
NOTE (logical table separation):
The ColumnStore stores data per logical field (a LogicalFieldId) rather than per ‘table’ directly. A LogicalFieldId encodes a namespace and a table id together (see crate::types::LogicalFieldId). This design lets multiple tables share the same underlying storage layer without collisions because every persisted column descriptor, chunk chain and presence/permutation index is keyed by the full LogicalFieldId.
Concretely:
catalog.mapmaps each LogicalFieldId -> descriptor physical key .- Each ColumnDescriptor contains chunk metadata and a persisted
total_row_countfor that logical field only. - Row-id shadow columns (presence/permutation indices) are created per logical field (they are derived from the LogicalFieldId) and are used for fast existence checks and indexed gathers.
The append implementation below expects incoming RecordBatches to be
namespaced: each field’s metadata contains a “field_id” that, when
combined with the current table id, maps to the LogicalFieldId the
ColumnStore will append into. This ensures that appends for different
tables never overwrite each other’s descriptors or chunks because the
LogicalFieldId namespace keeps them separate.
Constraints and recommended usage:
- A single
RecordBatchcarries exactly oneROW_IDcolumn which is shared by all other columns in that batch. Because of this, a batch is implicitly tied to a single table’s row-id space and should not mix columns from different tables. - To append data for multiple tables, create separate
RecordBatchinstances (one per table) and callappendfor each. These calls may be executed concurrently for throughput; the ColumnStore persists data per-logical-field so physical writes won’t collide. - If you need an atomic multi-table append, the current API does not provide that; you’d need a higher-level coordinator that issues the per-table appends within a single transaction boundary (or extend the API to accept per-column row-id arrays). Such a change is more invasive and requires reworking LWW/commit logic.
Sourcepub fn delete_rows<I>(
&self,
field_id: LogicalFieldId,
rows_to_delete: I,
) -> Result<()>where
I: IntoIterator<Item = u64>,
pub fn delete_rows<I>(
&self,
field_id: LogicalFieldId,
rows_to_delete: I,
) -> Result<()>where
I: IntoIterator<Item = u64>,
Explicit delete by global row positions (0-based, ascending, unique) -> in-place chunk rewrite.
Precondition: rows_to_delete yields strictly increasing positions.
Explicit delete by global row positions (0-based, ascending,
unique) -> in-place chunk rewrite.
Sourcepub fn verify_integrity(&self) -> Result<()>
pub fn verify_integrity(&self) -> Result<()>
Verifies the integrity of the column store’s metadata.
This check is useful for tests and debugging. It verifies:
- The catalog can be read.
- All descriptor chains are walkable.
- The row and chunk counts in the descriptors match the sum of the chunk metadata.
Returns Ok(()) if consistent, otherwise returns an Error.
Sourcepub fn get_layout_stats(&self) -> Result<Vec<ColumnLayoutStats>>
pub fn get_layout_stats(&self) -> Result<Vec<ColumnLayoutStats>>
Gathers detailed statistics about the storage layout.
This method is designed for low-level analysis and debugging, allowing you to check for under- or over-utilization of descriptor pages.
Source§impl<P> ColumnStore<P>where
P: Pager<Blob = EntryHandle>,
impl<P> ColumnStore<P>where
P: Pager<Blob = EntryHandle>,
Sourcepub fn scan(
&self,
field_id: LogicalFieldId,
opts: ScanOptions,
visitor: &mut dyn PrimitiveFullVisitor,
) -> Result<()>
pub fn scan( &self, field_id: LogicalFieldId, opts: ScanOptions, visitor: &mut dyn PrimitiveFullVisitor, ) -> Result<()>
Unified scan entrypoint configured by ScanOptions.
Requires V to implement both unsorted and sorted visitor traits; methods are no-ops by default.
Source§impl<P> ColumnStore<P>
impl<P> ColumnStore<P>
Sourcepub fn gather_rows(
&self,
field_ids: &[LogicalFieldId],
row_ids: &[u64],
policy: GatherNullPolicy,
) -> Result<RecordBatch>
pub fn gather_rows( &self, field_ids: &[LogicalFieldId], row_ids: &[u64], policy: GatherNullPolicy, ) -> Result<RecordBatch>
Gathers multiple columns using a configurable null-handling policy.
When GatherNullPolicy::DropNulls is selected, rows where all
projected columns are null or missing are removed from the
resulting batch.
pub fn prepare_gather_context( &self, field_ids: &[LogicalFieldId], ) -> Result<MultiGatherContext>
Sourcepub fn gather_rows_with_reusable_context(
&self,
ctx: &mut MultiGatherContext,
row_ids: &[u64],
policy: GatherNullPolicy,
) -> Result<RecordBatch>
pub fn gather_rows_with_reusable_context( &self, ctx: &mut MultiGatherContext, row_ids: &[u64], policy: GatherNullPolicy, ) -> Result<RecordBatch>
Gathers rows while reusing chunk caches and scratch buffers stored in the context.
This path amortizes chunk fetch and decode costs across multiple calls by retaining Arrow arrays and scratch state inside the provided context.
Trait Implementations§
Source§impl<P: Pager> ColumnStoreDebug for ColumnStore<P>
impl<P: Pager> ColumnStoreDebug for ColumnStore<P>
Source§fn render_storage_as_formatted_string(&self) -> String
fn render_storage_as_formatted_string(&self) -> String
Source§fn render_storage_as_dot(
&self,
batch_colors: &HashMap<PhysicalKey, usize>,
) -> String
fn render_storage_as_dot( &self, batch_colors: &HashMap<PhysicalKey, usize>, ) -> String
.dot file string.
The caller provides a map to color nodes based on an external category, like batch number.Auto Trait Implementations§
impl<P> !Freeze for ColumnStore<P>
impl<P> RefUnwindSafe for ColumnStore<P>where
P: RefUnwindSafe,
impl<P> Send for ColumnStore<P>
impl<P> Sync for ColumnStore<P>
impl<P> Unpin for ColumnStore<P>
impl<P> UnwindSafe for ColumnStore<P>where
P: RefUnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more