Struct Table

Source

pub struct Table {
    pub schema: TableSchema,
    /* private fields */
}

Expand description

In-memory table - stores rows with optimized indexing and validation

§Architecture

The Table struct acts as an orchestration layer, delegating specialized operations to dedicated components:

Row Storage: Direct Vec storage for sequential access (table scans)
Deletion Bitmap: O(1) deletion via bitmap marking instead of Vec::remove()
Columnar Storage: Native columnar storage for OLAP-optimized tables
Indexing: IndexManager maintains hash indexes for constraint checks
Normalization: RowNormalizer handles value transformation and validation
Optimization: Append mode tracking for sequential insert performance

§Storage Formats

Tables support two storage formats:

Row-oriented (default): Traditional row storage, optimized for OLTP
Columnar: Native column storage, optimized for OLAP with zero conversion overhead

§Columnar Storage Limitations

IMPORTANT: Columnar tables are optimized for read-heavy analytical workloads. Each INSERT/UPDATE/DELETE operation triggers a full rebuild of the columnar representation (O(n) cost). This makes columnar tables unsuitable for:

High-frequency INSERT workloads
OLTP use cases with frequent writes
Streaming inserts

Recommended use cases for columnar tables:

Bulk-loaded analytical data (load once, query many times)
Reporting tables with infrequent updates
Data warehouse fact tables

For mixed workloads, use row-oriented storage with the columnar cache (via scan_columnar()), which provides SIMD acceleration with caching.

§Performance Characteristics

INSERT: O(1) amortized for row append + O(1) for index updates
UPDATE: O(1) for row update + O(k) for k affected indexes (selective mode)
DELETE: O(1) per row via bitmap marking (amortized O(n) for compaction)
SCAN: O(n) direct vector iteration (skipping deleted rows)
COLUMNAR SCAN: O(n) with SIMD acceleration (no conversion overhead for native columnar)
PK/UNIQUE lookup: O(1) via hash indexes

§Example

use vibesql_catalog::TableSchema;
use vibesql_storage::Table;

let schema = TableSchema::new("users", columns);
let mut table = Table::new(schema);

// Insert automatically validates and indexes
table.insert(row)?;

// Scan returns all rows
for row in table.scan() {
    // Process row...
}

Fields§

§schema: TableSchema

Table schema defining structure and constraints

Implementations§

Source §

impl Table

Source

pub fn new(schema: TableSchema) -> Self

Create a new empty table with given schema

The storage format is determined by the schema’s storage_format field:

Row: Traditional row-oriented storage (default)
Columnar: Native columnar storage for analytical workloads

Source

pub fn is_native_columnar(&self) -> bool

Check if this table uses native columnar storage

Source

pub fn insert(&mut self, row: Row) -> Result<(), StorageError>

Insert a row into the table

For row-oriented tables, rows are stored directly in a Vec. For columnar tables, rows are buffered and the columnar data is rebuilt.

Source

pub fn insert_batch(&mut self, rows: Vec<Row>) -> Result<usize, StorageError>

Insert multiple rows into the table in a single batch operation

This method is optimized for bulk data loading and provides significant performance improvements over repeated single-row inserts:

Pre-allocation: Vector capacity is reserved upfront
Batch normalization: Rows are validated/normalized together
Deferred index updates: Indexes are rebuilt once after all inserts
Single cache invalidation: Columnar cache invalidated once at end
Statistics update once: Stats marked stale only at completion

§Arguments

rows - Vector of rows to insert

§Returns

Ok(usize) - Number of rows successfully inserted
Err(StorageError) - If any row fails validation (no rows inserted on error)

§Performance

For large batches (1000+ rows), this method is typically 10-50x faster than equivalent single-row inserts due to reduced per-row overhead.

§Example

let rows = vec![
    Row::new(vec![SqlValue::Integer(1), SqlValue::Varchar(arcstr::ArcStr::from("Alice"))]),
    Row::new(vec![SqlValue::Integer(2), SqlValue::Varchar(arcstr::ArcStr::from("Bob"))]),
    Row::new(vec![SqlValue::Integer(3), SqlValue::Varchar(arcstr::ArcStr::from("Charlie"))]),
];
let count = table.insert_batch(rows)?;
assert_eq!(count, 3);

Source

pub fn insert_from_iter( &mut self, rows: I, batch_size: usize, ) -> Result<usize, StorageError>
where I: Iterator<Item = Row>,

Insert rows from an iterator in a streaming fashion

This method is optimized for very large datasets that may not fit in memory all at once. Rows are processed in configurable batch sizes.

§Arguments

rows - Iterator yielding rows to insert
batch_size - Number of rows to process per batch (default: 1000)

§Returns

Ok(usize) - Total number of rows successfully inserted
Err(StorageError) - If any row fails validation

§Note

Unlike insert_batch, this method commits rows in batches, so a failure partway through will leave previously committed batches in the table. Use insert_batch if you need all-or-nothing semantics.

§Example

// Stream rows from a file reader
let rows_iter = csv_reader.rows().map(|r| Row::from_csv_record(r));
let count = table.insert_from_iter(rows_iter, 1000)?;

Source

pub fn scan(&self) -> &[Row]

Get all rows for scanning

Returns a slice of all rows in the table. For tables with a deletion bitmap, this returns the raw storage which may include deleted rows.

Important: For operations that need to skip deleted rows, use scan_live() which filters deleted rows automatically.

Source

pub fn is_row_deleted(&self, idx: usize) -> bool

Check if a row at the given index is deleted

Source

pub fn scan_live(&self) -> impl Iterator<Item = (usize, &Row)>

Iterate over live (non-deleted) rows with their physical indices

This is the preferred way to scan table data, as it automatically skips rows that have been deleted but not yet compacted.

§Returns

An iterator yielding (physical_index, &Row) pairs for all live rows.

§Example

for (idx, row) in table.scan_live() {
    // idx is the physical index, can be used with get_row() or delete_by_indices()
    process_row(idx, row);
}

Source

pub fn scan_live_vec(&self) -> Vec<Row>

Scan only live (non-deleted) rows, returning an owned Vec.

This method provides an efficient way to get all live rows as a Vec for executor paths that need owned data. Unlike scan() which returns all rows including deleted ones, this method filters out deleted rows.

§Performance

O(n) time and space where n is the number of live rows. Pre-allocates the exact capacity needed based on row_count().

§Returns

A Vec containing clones of all non-deleted rows.

§Example

// For SELECT queries that need a Vec<Row>
let rows = table.scan_live_vec();

Source

pub fn get_row(&self, idx: usize) -> Option<&Row>

Get a single row by index position (O(1) access)

Returns None if the row is deleted or index is out of bounds.

§Arguments

idx - The row index position (physical index)

§Returns

Some(&Row) - The row at the given index if it exists and is not deleted
None - If the index is out of bounds or row is deleted

Source

pub fn scan_columnar(&self) -> Result<ColumnarTable, StorageError>

Scan table data in columnar format for SIMD-accelerated processing

This method returns columnar data suitable for high-performance analytical queries. Unlike scan() which returns row-oriented data, this method returns column-oriented data that enables:

SIMD vectorization: Process 4-8 values per CPU instruction
Cache efficiency: Contiguous column data improves memory access patterns
Type specialization: Avoid SqlValue enum matching overhead

§Performance

For native columnar tables: Zero conversion overhead - returns data directly. For row tables: O(n * m) conversion cost per call.

§Caching

This method does not cache results. For cached columnar access with LRU eviction, use Database::get_columnar() which provides Arc-based sharing across queries.

§Returns

Ok(ColumnarTable) - Columnar representation of the table data
Err(StorageError) - If conversion fails due to type mismatches

§Example

let columnar = table.scan_columnar()?;
// Process with SIMD-accelerated operations
if let Some(ColumnData::Int64 { values, nulls }) = columnar.get_column("quantity") {
    // SIMD filtering on values slice
}

Source

pub fn row_count(&self) -> usize

Get number of live (non-deleted) rows

Source

pub fn physical_row_count(&self) -> usize

Get total number of rows including deleted ones (physical storage size)

Source

pub fn deleted_count(&self) -> usize

Get count of deleted (logically removed) rows

This is used for DML cost estimation, as tables with many deleted rows may have degraded performance for UPDATE/DELETE operations.

Source

pub fn statistics(&mut self) -> &TableStatistics

Get table statistics, computing if necessary

Statistics are computed lazily on first access and cached. They are marked stale after significant data changes (> 10% of rows).

Source

pub fn get_statistics(&self) -> Option<&TableStatistics>

Get cached table statistics without computing

Returns None if statistics have never been computed or are stale. Use statistics() if you want to compute/refresh statistics.

Source

pub fn analyze(&mut self)

Force recomputation of statistics (ANALYZE command)

Source

pub fn is_in_append_mode(&self) -> bool

Check if table is in append mode (sequential inserts detected) When true, constraint checks can skip duplicate lookups for optimization

Source

pub fn clear(&mut self)

Clear all rows

Source

pub fn update_row(&mut self, index: usize, row: Row) -> Result<(), StorageError>

Update a row at the specified index

Source

pub fn update_row_selective( &mut self, index: usize, row: Row, changed_columns: &HashSet<usize>, ) -> Result<(), StorageError>

Update a row with selective index maintenance

Only updates indexes that reference changed columns, providing significant performance improvement for tables with many indexes when updating non-indexed columns.

§Arguments

index - Row index to update
row - New row data
changed_columns - Set of column indices that were modified

§Returns

Ok(()) on success
Err(StorageError) if index out of bounds or column count mismatch

Source

pub fn update_row_unchecked( &mut self, index: usize, new_row: Row, old_row: &Row, changed_columns: &HashSet<usize>, )

Fast path update for pre-validated rows

This variant skips normalization/validation, assuming the caller has already validated the row data. Use for performance-critical UPDATE paths where validation was done at the executor level.

§Arguments

index - Row index to update
new_row - Pre-validated new row data (ownership transferred)
old_row - Reference to old row for index updates
changed_columns - Set of column indices that were modified

§Safety

Caller must ensure row data is valid (correct column count, types, constraints)

Source

pub fn update_column_inplace( &mut self, row_index: usize, col_index: usize, new_value: SqlValue, )

Update a single column value in-place without cloning the row

This is the fastest possible update path for non-indexed columns:

No row cloning (direct in-place modification)
No index updates (caller must verify column is not indexed)
No validation (caller must pre-validate the value)

§Arguments

row_index - Index of the row to update
col_index - Index of the column to update
new_value - The new value for the column

§Safety

Caller must ensure:

The column is NOT indexed (no internal or user-defined indexes)
The value satisfies all constraints (NOT NULL, type, etc.)

Source

pub fn delete_where<F>(&mut self, predicate: F) -> DeleteResult
where F: FnMut(&Row) -> bool,

Delete rows matching a predicate

Uses O(1) bitmap marking for each deleted row instead of O(n) Vec::remove().

§Returns

DeleteResult containing the count of deleted rows and whether compaction occurred.

Source

pub fn remove_row(&mut self, target_row: &Row) -> Result<(), StorageError>

Remove a specific row (used for transaction undo) Returns error if row not found

Uses O(1) bitmap marking instead of O(n) Vec::remove().

Note: This method does not return compaction status since it’s used internally for transaction rollback where index consistency is handled at a higher level.

Source

pub fn delete_by_indices(&mut self, indices: &[usize]) -> DeleteResult

Delete rows by known indices (fast path - no scanning required)

Uses O(1) bitmap marking instead of O(n) Vec::remove(). Rows are marked as deleted but remain in the vector until compaction is triggered.

§Arguments

indices - Indices of rows to delete, need not be sorted

§Returns

DeleteResult containing:

deleted_count: Number of rows deleted
compacted: Whether compaction occurred (row indices changed)

§Important

When compacted is true, all row indices in the table have changed. User-defined indexes (B-tree indexes managed at the Database level) must be rebuilt after compaction to maintain correctness.

§Performance

O(d) where d = number of rows to delete, compared to O(d * n) for Vec::remove()

Source

pub fn delete_by_indices_batch(&mut self, indices: &[usize]) -> DeleteResult

Delete rows by known indices with batch-optimized internal index updates

This is an optimized version of delete_by_indices that pre-computes schema lookups for internal hash indexes, reducing overhead for multi-row deletes by ~30-40%.