pub struct Table {
pub schema: TableSchema,
/* private fields */
}Expand description
In-memory table - stores rows with optimized indexing and validation
§Architecture
The Table struct acts as an orchestration layer, delegating specialized
operations to dedicated components:
- Row Storage: Direct Vec storage for sequential access (table scans)
- Deletion Bitmap: O(1) deletion via bitmap marking instead of Vec::remove()
- Columnar Storage: Native columnar storage for OLAP-optimized tables
- Indexing:
IndexManagermaintains hash indexes for constraint checks - Normalization:
RowNormalizerhandles value transformation and validation - Optimization: Append mode tracking for sequential insert performance
§Storage Formats
Tables support two storage formats:
- Row-oriented (default): Traditional row storage, optimized for OLTP
- Columnar: Native column storage, optimized for OLAP with zero conversion overhead
§Columnar Storage Limitations
IMPORTANT: Columnar tables are optimized for read-heavy analytical workloads. Each INSERT/UPDATE/DELETE operation triggers a full rebuild of the columnar representation (O(n) cost). This makes columnar tables unsuitable for:
- High-frequency INSERT workloads
- OLTP use cases with frequent writes
- Streaming inserts
Recommended use cases for columnar tables:
- Bulk-loaded analytical data (load once, query many times)
- Reporting tables with infrequent updates
- Data warehouse fact tables
For mixed workloads, use row-oriented storage with the columnar cache
(via scan_columnar()), which provides SIMD acceleration with caching.
§Performance Characteristics
- INSERT: O(1) amortized for row append + O(1) for index updates
- UPDATE: O(1) for row update + O(k) for k affected indexes (selective mode)
- DELETE: O(1) per row via bitmap marking (amortized O(n) for compaction)
- SCAN: O(n) direct vector iteration (skipping deleted rows)
- COLUMNAR SCAN: O(n) with SIMD acceleration (no conversion overhead for native columnar)
- PK/UNIQUE lookup: O(1) via hash indexes
§Example
use vibesql_catalog::TableSchema;
use vibesql_storage::Table;
let schema = TableSchema::new("users", columns);
let mut table = Table::new(schema);
// Insert automatically validates and indexes
table.insert(row)?;
// Scan returns all rows
for row in table.scan() {
// Process row...
}Fields§
§schema: TableSchemaTable schema defining structure and constraints
Implementations§
Source§impl Table
impl Table
Sourcepub fn new(schema: TableSchema) -> Self
pub fn new(schema: TableSchema) -> Self
Create a new empty table with given schema
The storage format is determined by the schema’s storage_format field:
- Row: Traditional row-oriented storage (default)
- Columnar: Native columnar storage for analytical workloads
Sourcepub fn is_native_columnar(&self) -> bool
pub fn is_native_columnar(&self) -> bool
Check if this table uses native columnar storage
Sourcepub fn insert(&mut self, row: Row) -> Result<(), StorageError>
pub fn insert(&mut self, row: Row) -> Result<(), StorageError>
Insert a row into the table
For row-oriented tables, rows are stored directly in a Vec. For columnar tables, rows are buffered and the columnar data is rebuilt.
Sourcepub fn insert_batch(&mut self, rows: Vec<Row>) -> Result<usize, StorageError>
pub fn insert_batch(&mut self, rows: Vec<Row>) -> Result<usize, StorageError>
Insert multiple rows into the table in a single batch operation
This method is optimized for bulk data loading and provides significant performance improvements over repeated single-row inserts:
- Pre-allocation: Vector capacity is reserved upfront
- Batch normalization: Rows are validated/normalized together
- Deferred index updates: Indexes are rebuilt once after all inserts
- Single cache invalidation: Columnar cache invalidated once at end
- Statistics update once: Stats marked stale only at completion
§Arguments
rows- Vector of rows to insert
§Returns
Ok(usize)- Number of rows successfully insertedErr(StorageError)- If any row fails validation (no rows inserted on error)
§Performance
For large batches (1000+ rows), this method is typically 10-50x faster than equivalent single-row inserts due to reduced per-row overhead.
§Example
let rows = vec![
Row::new(vec![SqlValue::Integer(1), SqlValue::Varchar(arcstr::ArcStr::from("Alice"))]),
Row::new(vec![SqlValue::Integer(2), SqlValue::Varchar(arcstr::ArcStr::from("Bob"))]),
Row::new(vec![SqlValue::Integer(3), SqlValue::Varchar(arcstr::ArcStr::from("Charlie"))]),
];
let count = table.insert_batch(rows)?;
assert_eq!(count, 3);Sourcepub fn insert_from_iter<I>(
&mut self,
rows: I,
batch_size: usize,
) -> Result<usize, StorageError>
pub fn insert_from_iter<I>( &mut self, rows: I, batch_size: usize, ) -> Result<usize, StorageError>
Insert rows from an iterator in a streaming fashion
This method is optimized for very large datasets that may not fit in memory all at once. Rows are processed in configurable batch sizes.
§Arguments
rows- Iterator yielding rows to insertbatch_size- Number of rows to process per batch (default: 1000)
§Returns
Ok(usize)- Total number of rows successfully insertedErr(StorageError)- If any row fails validation
§Note
Unlike insert_batch, this method commits rows in batches, so a failure
partway through will leave previously committed batches in the table.
Use insert_batch if you need all-or-nothing semantics.
§Example
// Stream rows from a file reader
let rows_iter = csv_reader.rows().map(|r| Row::from_csv_record(r));
let count = table.insert_from_iter(rows_iter, 1000)?;Sourcepub fn scan(&self) -> &[Row]
pub fn scan(&self) -> &[Row]
Get all rows for scanning
Returns a slice of all rows in the table. For tables with a deletion bitmap, this returns the raw storage which may include deleted rows.
Important: For operations that need to skip deleted rows, use scan_live()
which filters deleted rows automatically.
Sourcepub fn is_row_deleted(&self, idx: usize) -> bool
pub fn is_row_deleted(&self, idx: usize) -> bool
Check if a row at the given index is deleted
Sourcepub fn scan_live(&self) -> impl Iterator<Item = (usize, &Row)>
pub fn scan_live(&self) -> impl Iterator<Item = (usize, &Row)>
Iterate over live (non-deleted) rows with their physical indices
This is the preferred way to scan table data, as it automatically skips rows that have been deleted but not yet compacted.
§Returns
An iterator yielding (physical_index, &Row) pairs for all live rows.
§Example
for (idx, row) in table.scan_live() {
// idx is the physical index, can be used with get_row() or delete_by_indices()
process_row(idx, row);
}Sourcepub fn scan_live_vec(&self) -> Vec<Row>
pub fn scan_live_vec(&self) -> Vec<Row>
Scan only live (non-deleted) rows, returning an owned Vec.
This method provides an efficient way to get all live rows as a Vecscan() which returns
all rows including deleted ones, this method filters out deleted rows.
§Performance
O(n) time and space where n is the number of live rows.
Pre-allocates the exact capacity needed based on row_count().
§Returns
A Vec containing clones of all non-deleted rows.
§Example
// For SELECT queries that need a Vec<Row>
let rows = table.scan_live_vec();Sourcepub fn get_row(&self, idx: usize) -> Option<&Row>
pub fn get_row(&self, idx: usize) -> Option<&Row>
Get a single row by index position (O(1) access)
Returns None if the row is deleted or index is out of bounds.
§Arguments
idx- The row index position (physical index)
§Returns
Some(&Row)- The row at the given index if it exists and is not deletedNone- If the index is out of bounds or row is deleted
Sourcepub fn scan_columnar(&self) -> Result<ColumnarTable, StorageError>
pub fn scan_columnar(&self) -> Result<ColumnarTable, StorageError>
Scan table data in columnar format for SIMD-accelerated processing
This method returns columnar data suitable for high-performance analytical queries.
Unlike scan() which returns row-oriented data, this method returns column-oriented
data that enables:
- SIMD vectorization: Process 4-8 values per CPU instruction
- Cache efficiency: Contiguous column data improves memory access patterns
- Type specialization: Avoid SqlValue enum matching overhead
§Performance
For native columnar tables: Zero conversion overhead - returns data directly. For row tables: O(n * m) conversion cost per call.
§Caching
This method does not cache results. For cached columnar access with LRU eviction,
use Database::get_columnar() which provides Arc-based sharing across queries.
§Returns
Ok(ColumnarTable)- Columnar representation of the table dataErr(StorageError)- If conversion fails due to type mismatches
§Example
let columnar = table.scan_columnar()?;
// Process with SIMD-accelerated operations
if let Some(ColumnData::Int64 { values, nulls }) = columnar.get_column("quantity") {
// SIMD filtering on values slice
}Sourcepub fn physical_row_count(&self) -> usize
pub fn physical_row_count(&self) -> usize
Get total number of rows including deleted ones (physical storage size)
Sourcepub fn deleted_count(&self) -> usize
pub fn deleted_count(&self) -> usize
Get count of deleted (logically removed) rows
This is used for DML cost estimation, as tables with many deleted rows may have degraded performance for UPDATE/DELETE operations.
Sourcepub fn statistics(&mut self) -> &TableStatistics
pub fn statistics(&mut self) -> &TableStatistics
Get table statistics, computing if necessary
Statistics are computed lazily on first access and cached. They are marked stale after significant data changes (> 10% of rows).
Sourcepub fn get_statistics(&self) -> Option<&TableStatistics>
pub fn get_statistics(&self) -> Option<&TableStatistics>
Get cached table statistics without computing
Returns None if statistics have never been computed or are stale.
Use statistics() if you want to compute/refresh statistics.
Sourcepub fn is_in_append_mode(&self) -> bool
pub fn is_in_append_mode(&self) -> bool
Check if table is in append mode (sequential inserts detected) When true, constraint checks can skip duplicate lookups for optimization
Sourcepub fn update_row(&mut self, index: usize, row: Row) -> Result<(), StorageError>
pub fn update_row(&mut self, index: usize, row: Row) -> Result<(), StorageError>
Update a row at the specified index
Sourcepub fn update_row_selective(
&mut self,
index: usize,
row: Row,
changed_columns: &HashSet<usize>,
) -> Result<(), StorageError>
pub fn update_row_selective( &mut self, index: usize, row: Row, changed_columns: &HashSet<usize>, ) -> Result<(), StorageError>
Update a row with selective index maintenance
Only updates indexes that reference changed columns, providing significant performance improvement for tables with many indexes when updating non-indexed columns.
§Arguments
index- Row index to updaterow- New row datachanged_columns- Set of column indices that were modified
§Returns
Ok(())on successErr(StorageError)if index out of bounds or column count mismatch
Sourcepub fn update_row_unchecked(
&mut self,
index: usize,
new_row: Row,
old_row: &Row,
changed_columns: &HashSet<usize>,
)
pub fn update_row_unchecked( &mut self, index: usize, new_row: Row, old_row: &Row, changed_columns: &HashSet<usize>, )
Fast path update for pre-validated rows
This variant skips normalization/validation, assuming the caller has already validated the row data. Use for performance-critical UPDATE paths where validation was done at the executor level.
§Arguments
index- Row index to updatenew_row- Pre-validated new row data (ownership transferred)old_row- Reference to old row for index updateschanged_columns- Set of column indices that were modified
§Safety
Caller must ensure row data is valid (correct column count, types, constraints)
Sourcepub fn update_column_inplace(
&mut self,
row_index: usize,
col_index: usize,
new_value: SqlValue,
)
pub fn update_column_inplace( &mut self, row_index: usize, col_index: usize, new_value: SqlValue, )
Update a single column value in-place without cloning the row
This is the fastest possible update path for non-indexed columns:
- No row cloning (direct in-place modification)
- No index updates (caller must verify column is not indexed)
- No validation (caller must pre-validate the value)
§Arguments
row_index- Index of the row to updatecol_index- Index of the column to updatenew_value- The new value for the column
§Safety
Caller must ensure:
- The column is NOT indexed (no internal or user-defined indexes)
- The value satisfies all constraints (NOT NULL, type, etc.)
Sourcepub fn delete_where<F>(&mut self, predicate: F) -> DeleteResult
pub fn delete_where<F>(&mut self, predicate: F) -> DeleteResult
Delete rows matching a predicate
Uses O(1) bitmap marking for each deleted row instead of O(n) Vec::remove().
§Returns
DeleteResult containing the count of deleted rows and whether compaction occurred.
Sourcepub fn remove_row(&mut self, target_row: &Row) -> Result<(), StorageError>
pub fn remove_row(&mut self, target_row: &Row) -> Result<(), StorageError>
Remove a specific row (used for transaction undo) Returns error if row not found
Uses O(1) bitmap marking instead of O(n) Vec::remove().
Note: This method does not return compaction status since it’s used internally for transaction rollback where index consistency is handled at a higher level.
Sourcepub fn delete_by_indices(&mut self, indices: &[usize]) -> DeleteResult
pub fn delete_by_indices(&mut self, indices: &[usize]) -> DeleteResult
Delete rows by known indices (fast path - no scanning required)
Uses O(1) bitmap marking instead of O(n) Vec::remove(). Rows are marked as deleted but remain in the vector until compaction is triggered.
§Arguments
indices- Indices of rows to delete, need not be sorted
§Returns
DeleteResult containing:
deleted_count: Number of rows deletedcompacted: Whether compaction occurred (row indices changed)
§Important
When compacted is true, all row indices in the table have changed.
User-defined indexes (B-tree indexes managed at the Database level)
must be rebuilt after compaction to maintain correctness.
§Performance
O(d) where d = number of rows to delete, compared to O(d * n) for Vec::remove()
Sourcepub fn delete_by_indices_batch(&mut self, indices: &[usize]) -> DeleteResult
pub fn delete_by_indices_batch(&mut self, indices: &[usize]) -> DeleteResult
Delete rows by known indices with batch-optimized internal index updates
This is an optimized version of delete_by_indices that pre-computes
schema lookups for internal hash indexes, reducing overhead for multi-row
deletes by ~30-40%.
§Arguments
indices- Indices of rows to delete, need not be sorted
§Returns
DeleteResult containing:
deleted_count: Number of rows deletedcompacted: Whether compaction occurred (row indices changed)
§Performance
- Pre-computes PK/unique column indices once (O(1) vs O(d) schema lookups)
- Uses batch index updates for internal hash indexes
- Best for multi-row deletes; single-row deletes use
delete_by_indices
Sourcepub fn is_deleted(&self, idx: usize) -> bool
pub fn is_deleted(&self, idx: usize) -> bool
Check if a row at the given index is deleted
Sourcepub fn schema_mut(&mut self) -> &mut TableSchema
pub fn schema_mut(&mut self) -> &mut TableSchema
Get mutable reference to schema
Sourcepub fn primary_key_index(&self) -> Option<&HashMap<Vec<SqlValue>, usize>>
pub fn primary_key_index(&self) -> Option<&HashMap<Vec<SqlValue>, usize>>
Get reference to primary key index
Sourcepub fn unique_indexes(&self) -> &[HashMap<Vec<SqlValue>, usize>]
pub fn unique_indexes(&self) -> &[HashMap<Vec<SqlValue>, usize>]
Get reference to unique constraint indexes
Sourcepub fn rebuild_indexes(&mut self)
pub fn rebuild_indexes(&mut self)
Rebuild all hash indexes from scratch Used after schema changes that add constraints (e.g., ALTER TABLE ADD PRIMARY KEY)