pub struct ColumnarBatch { /* private fields */ }Expand description
A columnar batch stores data in column-oriented format for efficient SIMD processing
Unlike row-oriented storage (Vec
- SIMD vectorization (process 4-8 values per instruction)
- Better cache locality (columns accessed together are stored together)
- Type-specialized code paths (no SqlValue enum matching)
- Efficient NULL handling with separate bitmasks
§Example
// Convert rows to columnar batch
let batch = ColumnarBatch::from_rows(&rows, &schema)?;
// Access columns with zero-copy
if let ColumnArray::Int64(values, nulls) = &batch.columns[0] {
// Process with SIMD operations
let sum = simd_sum_i64(values);
}Implementations§
Source§impl ColumnarBatch
impl ColumnarBatch
Sourcepub fn from_arrow_batch(batch: &RecordBatch) -> Result<Self, ExecutorError>
pub fn from_arrow_batch(batch: &RecordBatch) -> Result<Self, ExecutorError>
Convert from Arrow RecordBatch to ColumnarBatch (zero-copy when possible)
This provides integration with Arrow-based storage engines, enabling zero-copy columnar query execution. Arrow’s columnar format maps directly to our ColumnarBatch structure.
§Performance
- Zero-copy: Numeric types (Int64, Float64) are converted with minimal overhead
- < 1ms overhead: Conversion time negligible compared to query execution
- Memory efficient: Reuses Arrow’s allocated memory where possible
§Arguments
batch- Arrow RecordBatch from storage layer
§Returns
A ColumnarBatch ready for SIMD-accelerated query execution
Source§impl ColumnarBatch
impl ColumnarBatch
Sourcepub fn with_capacity(_row_count: usize, column_count: usize) -> Self
pub fn with_capacity(_row_count: usize, column_count: usize) -> Self
Create a columnar batch with specified capacity
Sourcepub fn empty(column_count: usize) -> Result<Self, ExecutorError>
pub fn empty(column_count: usize) -> Result<Self, ExecutorError>
Create an empty batch with the specified number of columns
Sourcepub fn from_columns(
columns: Vec<ColumnArray>,
column_names: Option<Vec<String>>,
) -> Result<Self, ExecutorError>
pub fn from_columns( columns: Vec<ColumnArray>, column_names: Option<Vec<String>>, ) -> Result<Self, ExecutorError>
Create a batch from a list of columns
Sourcepub fn from_rows(rows: &[Row]) -> Result<Self, ExecutorError>
pub fn from_rows(rows: &[Row]) -> Result<Self, ExecutorError>
Convert from row-oriented storage to columnar batch
This analyzes the first row to infer column types, then materializes all values into type-specialized column arrays.
Sourcepub fn from_rows_selective(
rows: &[Row],
column_indices: &[usize],
) -> Result<Self, ExecutorError>
pub fn from_rows_selective( rows: &[Row], column_indices: &[usize], ) -> Result<Self, ExecutorError>
Convert selected columns from row-oriented storage to columnar batch
This is an optimized version of from_rows that only extracts the
specified columns. This is critical for predicate evaluation on wide
tables where only a few columns are referenced by the WHERE clause.
§Arguments
rows- The rows to convertcolumn_indices- Which column indices to extract (must be sorted)
§Returns
A sparse columnar batch where column(i) returns the data for
column_indices[i]. The caller must map original column indices
to batch positions using the column_indices array.
§Performance
For a table with 16 columns where only 1 column is needed:
from_rows: extracts all 16 columns (100% work)from_rows_selective: extracts only 1 column (6% work)
Source§impl ColumnarBatch
impl ColumnarBatch
Sourcepub fn column_count(&self) -> usize
pub fn column_count(&self) -> usize
Get the number of columns in this batch
Sourcepub fn column(&self, index: usize) -> Option<&ColumnArray>
pub fn column(&self, index: usize) -> Option<&ColumnArray>
Get a reference to a column array
Sourcepub fn column_mut(&mut self, index: usize) -> Option<&mut ColumnArray>
pub fn column_mut(&mut self, index: usize) -> Option<&mut ColumnArray>
Get a mutable reference to a column array
Sourcepub fn add_column(&mut self, column: ColumnArray) -> Result<(), ExecutorError>
pub fn add_column(&mut self, column: ColumnArray) -> Result<(), ExecutorError>
Add a column to the batch
Sourcepub fn set_column_names(&mut self, names: Vec<String>)
pub fn set_column_names(&mut self, names: Vec<String>)
Set column names (for debugging)
Sourcepub fn column_names(&self) -> Option<&[String]>
pub fn column_names(&self) -> Option<&[String]>
Get column names
Sourcepub fn column_index_by_name(&self, name: &str) -> Option<usize>
pub fn column_index_by_name(&self, name: &str) -> Option<usize>
Get column index by name
Sourcepub fn get_value(
&self,
row_idx: usize,
col_idx: usize,
) -> Result<SqlValue, ExecutorError>
pub fn get_value( &self, row_idx: usize, col_idx: usize, ) -> Result<SqlValue, ExecutorError>
Get a value at a specific (row, column) position
Sourcepub fn to_rows(&self) -> Result<Vec<Row>, ExecutorError>
pub fn to_rows(&self) -> Result<Vec<Row>, ExecutorError>
Convert columnar batch back to row-oriented storage
This implementation uses column-outer, row-inner iteration for better
cache locality. Instead of calling get_value() for each cell (O(n*m)
enum matches), we match on each column once and iterate through all
its values sequentially.
§Performance
For a 60,000 row × 15 column batch:
- Old approach: 900,000
get_value()calls with enum matching per cell - New approach: 15 enum matches total, sequential memory access per column
Expected 2-3x speedup due to:
- Single enum match per column (not per cell)
- Sequential memory access within each column array
- Better CPU cache utilization
Sourcepub fn deduplicate(&self) -> Result<Self, ExecutorError>
pub fn deduplicate(&self) -> Result<Self, ExecutorError>
Deduplicate rows in the batch, returning a new batch with unique rows only
Uses hash-based deduplication on all columns, preserving insertion order. This implements DISTINCT semantics: NULL == NULL for uniqueness purposes.
§Performance
- O(n) time complexity where n is the number of rows
- Uses AHashSet for efficient duplicate detection
- Preserves the first occurrence of each unique row combination
§Example
// Original batch:
// [1, "A"], [2, "B"], [1, "A"], [3, "C"]
// After deduplicate():
// [1, "A"], [2, "B"], [3, "C"]Sourcepub fn select_rows(&self, indices: &[usize]) -> Result<Self, ExecutorError>
pub fn select_rows(&self, indices: &[usize]) -> Result<Self, ExecutorError>
Source§impl ColumnarBatch
impl ColumnarBatch
Sourcepub fn from_storage_columnar(
storage_columnar: &ColumnarTable,
) -> Result<Self, ExecutorError>
pub fn from_storage_columnar( storage_columnar: &ColumnarTable, ) -> Result<Self, ExecutorError>
Convert from storage layer ColumnarTable to executor ColumnarBatch
This method provides true zero-copy conversion from the storage layer’s columnar format to the executor’s columnar format. This is the key integration point for native columnar table scans.
§Performance
- O(1) for numeric/string columns: Arc::clone is just a reference count bump
- < 1 microsecond for millions of rows (vs O(n) with data copy)
- Directly shares storage ColumnData with executor ColumnArray
- Critical path for TPC-H Q6 and other analytical queries
§Zero-Copy Design
Both vibesql_storage::ColumnData and executor ColumnArray use Arc<Vec<T>>
for column data. Calling Arc::clone() only increments a reference count,
avoiding any data copying:
Storage: Arc<Vec<i64>> ─┬─> [1, 2, 3, 4, ...] (shared memory)
│
Executor: Arc<Vec<i64>> ┘§Arguments
storage_columnar- ColumnarTable from storage layer (vibesql-storage)
§Returns
Ok(ColumnarBatch)- Executor-ready columnar batch with shared Arc referencesErr(ExecutorError)- If type conversion fails
Trait Implementations§
Source§impl Clone for ColumnarBatch
impl Clone for ColumnarBatch
Source§fn clone(&self) -> ColumnarBatch
fn clone(&self) -> ColumnarBatch
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for ColumnarBatch
impl RefUnwindSafe for ColumnarBatch
impl Send for ColumnarBatch
impl Sync for ColumnarBatch
impl Unpin for ColumnarBatch
impl UnwindSafe for ColumnarBatch
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more