Table

Struct Table 

Source
pub struct Table {
    pub schema: TableSchema,
    /* private fields */
}
Expand description

In-memory table - stores rows with optimized indexing and validation

§Architecture

The Table struct acts as an orchestration layer, delegating specialized operations to dedicated components:

  • Row Storage: Direct Vec storage for sequential access (table scans)
  • Deletion Bitmap: O(1) deletion via bitmap marking instead of Vec::remove()
  • Columnar Storage: Native columnar storage for OLAP-optimized tables
  • Indexing: IndexManager maintains hash indexes for constraint checks
  • Normalization: RowNormalizer handles value transformation and validation
  • Optimization: Append mode tracking for sequential insert performance

§Storage Formats

Tables support two storage formats:

  • Row-oriented (default): Traditional row storage, optimized for OLTP
  • Columnar: Native column storage, optimized for OLAP with zero conversion overhead

§Columnar Storage Limitations

IMPORTANT: Columnar tables are optimized for read-heavy analytical workloads. Each INSERT/UPDATE/DELETE operation triggers a full rebuild of the columnar representation (O(n) cost). This makes columnar tables unsuitable for:

  • High-frequency INSERT workloads
  • OLTP use cases with frequent writes
  • Streaming inserts

Recommended use cases for columnar tables:

  • Bulk-loaded analytical data (load once, query many times)
  • Reporting tables with infrequent updates
  • Data warehouse fact tables

For mixed workloads, use row-oriented storage with the columnar cache (via scan_columnar()), which provides SIMD acceleration with caching.

§Performance Characteristics

  • INSERT: O(1) amortized for row append + O(1) for index updates
  • UPDATE: O(1) for row update + O(k) for k affected indexes (selective mode)
  • DELETE: O(1) per row via bitmap marking (amortized O(n) for compaction)
  • SCAN: O(n) direct vector iteration (skipping deleted rows)
  • COLUMNAR SCAN: O(n) with SIMD acceleration (no conversion overhead for native columnar)
  • PK/UNIQUE lookup: O(1) via hash indexes

§Example

use vibesql_catalog::TableSchema;
use vibesql_storage::Table;

let schema = TableSchema::new("users", columns);
let mut table = Table::new(schema);

// Insert automatically validates and indexes
table.insert(row)?;

// Scan returns all rows
for row in table.scan() {
    // Process row...
}

Fields§

§schema: TableSchema

Table schema defining structure and constraints

Implementations§

Source§

impl Table

Source

pub fn new(schema: TableSchema) -> Self

Create a new empty table with given schema

The storage format is determined by the schema’s storage_format field:

  • Row: Traditional row-oriented storage (default)
  • Columnar: Native columnar storage for analytical workloads
Source

pub fn is_native_columnar(&self) -> bool

Check if this table uses native columnar storage

Source

pub fn insert(&mut self, row: Row) -> Result<(), StorageError>

Insert a row into the table

For row-oriented tables, rows are stored directly in a Vec. For columnar tables, rows are buffered and the columnar data is rebuilt.

Source

pub fn insert_batch(&mut self, rows: Vec<Row>) -> Result<usize, StorageError>

Insert multiple rows into the table in a single batch operation

This method is optimized for bulk data loading and provides significant performance improvements over repeated single-row inserts:

  • Pre-allocation: Vector capacity is reserved upfront
  • Batch normalization: Rows are validated/normalized together
  • Deferred index updates: Indexes are rebuilt once after all inserts
  • Single cache invalidation: Columnar cache invalidated once at end
  • Statistics update once: Stats marked stale only at completion
§Arguments
  • rows - Vector of rows to insert
§Returns
  • Ok(usize) - Number of rows successfully inserted
  • Err(StorageError) - If any row fails validation (no rows inserted on error)
§Performance

For large batches (1000+ rows), this method is typically 10-50x faster than equivalent single-row inserts due to reduced per-row overhead.

§Example
let rows = vec![
    Row::new(vec![SqlValue::Integer(1), SqlValue::Varchar(arcstr::ArcStr::from("Alice"))]),
    Row::new(vec![SqlValue::Integer(2), SqlValue::Varchar(arcstr::ArcStr::from("Bob"))]),
    Row::new(vec![SqlValue::Integer(3), SqlValue::Varchar(arcstr::ArcStr::from("Charlie"))]),
];
let count = table.insert_batch(rows)?;
assert_eq!(count, 3);
Source

pub fn insert_from_iter<I>( &mut self, rows: I, batch_size: usize, ) -> Result<usize, StorageError>
where I: Iterator<Item = Row>,

Insert rows from an iterator in a streaming fashion

This method is optimized for very large datasets that may not fit in memory all at once. Rows are processed in configurable batch sizes.

§Arguments
  • rows - Iterator yielding rows to insert
  • batch_size - Number of rows to process per batch (default: 1000)
§Returns
  • Ok(usize) - Total number of rows successfully inserted
  • Err(StorageError) - If any row fails validation
§Note

Unlike insert_batch, this method commits rows in batches, so a failure partway through will leave previously committed batches in the table. Use insert_batch if you need all-or-nothing semantics.

§Example
// Stream rows from a file reader
let rows_iter = csv_reader.rows().map(|r| Row::from_csv_record(r));
let count = table.insert_from_iter(rows_iter, 1000)?;
Source

pub fn scan(&self) -> &[Row]

Get all rows for scanning

Returns a slice of all rows in the table. For tables with a deletion bitmap, this returns the raw storage which may include deleted rows.

Important: For operations that need to skip deleted rows, use scan_live() which filters deleted rows automatically.

Source

pub fn is_row_deleted(&self, idx: usize) -> bool

Check if a row at the given index is deleted

Source

pub fn scan_live(&self) -> impl Iterator<Item = (usize, &Row)>

Iterate over live (non-deleted) rows with their physical indices

This is the preferred way to scan table data, as it automatically skips rows that have been deleted but not yet compacted.

§Returns

An iterator yielding (physical_index, &Row) pairs for all live rows.

§Example
for (idx, row) in table.scan_live() {
    // idx is the physical index, can be used with get_row() or delete_by_indices()
    process_row(idx, row);
}
Source

pub fn scan_live_vec(&self) -> Vec<Row>

Scan only live (non-deleted) rows, returning an owned Vec.

This method provides an efficient way to get all live rows as a Vec for executor paths that need owned data. Unlike scan() which returns all rows including deleted ones, this method filters out deleted rows.

§Performance

O(n) time and space where n is the number of live rows. Pre-allocates the exact capacity needed based on row_count().

§Returns

A Vec containing clones of all non-deleted rows.

§Example
// For SELECT queries that need a Vec<Row>
let rows = table.scan_live_vec();
Source

pub fn get_row(&self, idx: usize) -> Option<&Row>

Get a single row by index position (O(1) access)

Returns None if the row is deleted or index is out of bounds.

§Arguments
  • idx - The row index position (physical index)
§Returns
  • Some(&Row) - The row at the given index if it exists and is not deleted
  • None - If the index is out of bounds or row is deleted
Source

pub fn scan_columnar(&self) -> Result<ColumnarTable, StorageError>

Scan table data in columnar format for SIMD-accelerated processing

This method returns columnar data suitable for high-performance analytical queries. Unlike scan() which returns row-oriented data, this method returns column-oriented data that enables:

  • SIMD vectorization: Process 4-8 values per CPU instruction
  • Cache efficiency: Contiguous column data improves memory access patterns
  • Type specialization: Avoid SqlValue enum matching overhead
§Performance

For native columnar tables: Zero conversion overhead - returns data directly. For row tables: O(n * m) conversion cost per call.

§Caching

This method does not cache results. For cached columnar access with LRU eviction, use Database::get_columnar() which provides Arc-based sharing across queries.

§Returns
  • Ok(ColumnarTable) - Columnar representation of the table data
  • Err(StorageError) - If conversion fails due to type mismatches
§Example
let columnar = table.scan_columnar()?;
// Process with SIMD-accelerated operations
if let Some(ColumnData::Int64 { values, nulls }) = columnar.get_column("quantity") {
    // SIMD filtering on values slice
}
Source

pub fn row_count(&self) -> usize

Get number of live (non-deleted) rows

Source

pub fn physical_row_count(&self) -> usize

Get total number of rows including deleted ones (physical storage size)

Source

pub fn deleted_count(&self) -> usize

Get count of deleted (logically removed) rows

This is used for DML cost estimation, as tables with many deleted rows may have degraded performance for UPDATE/DELETE operations.

Source

pub fn statistics(&mut self) -> &TableStatistics

Get table statistics, computing if necessary

Statistics are computed lazily on first access and cached. They are marked stale after significant data changes (> 10% of rows).

Source

pub fn get_statistics(&self) -> Option<&TableStatistics>

Get cached table statistics without computing

Returns None if statistics have never been computed or are stale. Use statistics() if you want to compute/refresh statistics.

Source

pub fn analyze(&mut self)

Force recomputation of statistics (ANALYZE command)

Source

pub fn is_in_append_mode(&self) -> bool

Check if table is in append mode (sequential inserts detected) When true, constraint checks can skip duplicate lookups for optimization

Source

pub fn clear(&mut self)

Clear all rows

Source

pub fn update_row(&mut self, index: usize, row: Row) -> Result<(), StorageError>

Update a row at the specified index

Source

pub fn update_row_selective( &mut self, index: usize, row: Row, changed_columns: &HashSet<usize>, ) -> Result<(), StorageError>

Update a row with selective index maintenance

Only updates indexes that reference changed columns, providing significant performance improvement for tables with many indexes when updating non-indexed columns.

§Arguments
  • index - Row index to update
  • row - New row data
  • changed_columns - Set of column indices that were modified
§Returns
  • Ok(()) on success
  • Err(StorageError) if index out of bounds or column count mismatch
Source

pub fn update_row_unchecked( &mut self, index: usize, new_row: Row, old_row: &Row, changed_columns: &HashSet<usize>, )

Fast path update for pre-validated rows

This variant skips normalization/validation, assuming the caller has already validated the row data. Use for performance-critical UPDATE paths where validation was done at the executor level.

§Arguments
  • index - Row index to update
  • new_row - Pre-validated new row data (ownership transferred)
  • old_row - Reference to old row for index updates
  • changed_columns - Set of column indices that were modified
§Safety

Caller must ensure row data is valid (correct column count, types, constraints)

Source

pub fn update_column_inplace( &mut self, row_index: usize, col_index: usize, new_value: SqlValue, )

Update a single column value in-place without cloning the row

This is the fastest possible update path for non-indexed columns:

  • No row cloning (direct in-place modification)
  • No index updates (caller must verify column is not indexed)
  • No validation (caller must pre-validate the value)
§Arguments
  • row_index - Index of the row to update
  • col_index - Index of the column to update
  • new_value - The new value for the column
§Safety

Caller must ensure:

  • The column is NOT indexed (no internal or user-defined indexes)
  • The value satisfies all constraints (NOT NULL, type, etc.)
Source

pub fn delete_where<F>(&mut self, predicate: F) -> DeleteResult
where F: FnMut(&Row) -> bool,

Delete rows matching a predicate

Uses O(1) bitmap marking for each deleted row instead of O(n) Vec::remove().

§Returns

DeleteResult containing the count of deleted rows and whether compaction occurred.

Source

pub fn remove_row(&mut self, target_row: &Row) -> Result<(), StorageError>

Remove a specific row (used for transaction undo) Returns error if row not found

Uses O(1) bitmap marking instead of O(n) Vec::remove().

Note: This method does not return compaction status since it’s used internally for transaction rollback where index consistency is handled at a higher level.

Source

pub fn delete_by_indices(&mut self, indices: &[usize]) -> DeleteResult

Delete rows by known indices (fast path - no scanning required)

Uses O(1) bitmap marking instead of O(n) Vec::remove(). Rows are marked as deleted but remain in the vector until compaction is triggered.

§Arguments
  • indices - Indices of rows to delete, need not be sorted
§Returns

DeleteResult containing:

  • deleted_count: Number of rows deleted
  • compacted: Whether compaction occurred (row indices changed)
§Important

When compacted is true, all row indices in the table have changed. User-defined indexes (B-tree indexes managed at the Database level) must be rebuilt after compaction to maintain correctness.

§Performance

O(d) where d = number of rows to delete, compared to O(d * n) for Vec::remove()

Source

pub fn delete_by_indices_batch(&mut self, indices: &[usize]) -> DeleteResult

Delete rows by known indices with batch-optimized internal index updates

This is an optimized version of delete_by_indices that pre-computes schema lookups for internal hash indexes, reducing overhead for multi-row deletes by ~30-40%.

§Arguments
  • indices - Indices of rows to delete, need not be sorted
§Returns

DeleteResult containing:

  • deleted_count: Number of rows deleted
  • compacted: Whether compaction occurred (row indices changed)
§Performance
  • Pre-computes PK/unique column indices once (O(1) vs O(d) schema lookups)
  • Uses batch index updates for internal hash indexes
  • Best for multi-row deletes; single-row deletes use delete_by_indices
Source

pub fn is_deleted(&self, idx: usize) -> bool

Check if a row at the given index is deleted

Source

pub fn rows_mut(&mut self) -> &mut Vec<Row>

Get mutable reference to rows

Source

pub fn schema_mut(&mut self) -> &mut TableSchema

Get mutable reference to schema

Source

pub fn primary_key_index(&self) -> Option<&HashMap<Vec<SqlValue>, usize>>

Get reference to primary key index

Source

pub fn unique_indexes(&self) -> &[HashMap<Vec<SqlValue>, usize>]

Get reference to unique constraint indexes

Source

pub fn rebuild_indexes(&mut self)

Rebuild all hash indexes from scratch Used after schema changes that add constraints (e.g., ALTER TABLE ADD PRIMARY KEY)

Trait Implementations§

Source§

impl Clone for Table

Source§

fn clone(&self) -> Self

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Table

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl Freeze for Table

§

impl RefUnwindSafe for Table

§

impl Send for Table

§

impl Sync for Table

§

impl Unpin for Table

§

impl UnwindSafe for Table

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V