Module vectorized_scan

Expand description

SIMD-Accelerated Vectorized Scan Engine (Recommendation 2)

§Problem

Current scan implementation is row-at-a-time:

for entry in self.data.iter() {  // DashMap iteration - pointer chasing
    if let Some(v) = entry.value().read_at(snapshot_ts, current_txn_id)
        && let Some(value) = &v.value  // Option unwrapping
    {
        results.push((key.clone(), value.clone())); // Heap allocs
    }
}

This has:

DashMap iterator overhead (~20ns per entry)
MVCC version chain traversal per row
Clone allocations in hot path

§Solution

Vectorized execution: process 1024+ rows in a batch, amortizing overhead.

§Performance Analysis

Vectorized execution model:

Batch size B = 1024 rows
Per-batch overhead: O(1) iterator setup + O(B) SIMD operations
Amortized cost: (setup + B×simd_op) / B ≈ simd_op for large B

SIMD comparison (AVX-512):

Scalar: 100 cycles × 1000 rows = 100,000 cycles
SIMD: 100 cycles × (1000/16) = 6,250 cycles (16x speedup)

Memory bandwidth calculation:

Current: 300 bytes/row × 1M rows = 300 MB, random access
Proposed: 60 bytes/row × 1M rows = 60 MB, sequential scan
RAM bandwidth: ~50 GB/s → theoretical 833M rows/sec

§Expected Improvement

10-20x scan throughput improvement

Structs§

Int64Comparison: Comparison predicate for i64 columns
SimdVisibilityFilter: SIMD-accelerated MVCC visibility filter
SoaBatch: Structure of Arrays (SoA) batch for optimal SIMD visibility filtering
SoaScanIterator: High-performance SoA scan iterator with SIMD + late materialization
SoaScanStats: Statistics for SoA scan performance monitoring
StreamingScanIterator: Streaming scan iterator with SIMD visibility filtering
VectorBatch: A batch of rows for vectorized processing
VectorizedScanConfig: Configuration for vectorized scans
VectorizedScanStats: Statistics for vectorized scan operations
VersionedSlice: Versioned slice for zero-copy access

Enums§

ColumnVector: Typed column vector for SIMD-friendly access
ComparisonOp: Comparison operators
ValueHandle: Handle for late value materialization

Constants§

DEFAULT_BATCH_SIZE: Default batch size for vectorized operations 1024 rows fits comfortably in L2 cache (~256KB with 256-byte rows)
MAX_BATCH_SIZE: Maximum batch size (above this, memory pressure increases)
MIN_BATCH_SIZE: Minimum batch size (below this, scalar is faster)

Traits§

SoaSource: Trait for sources that provide SoA batches
VectorPredicate: Predicate for vectorized filtering

Functions§

simd_compare_i64_gt

Module vectorized_scan

Module vectorized_scan Copy item path

§Problem

§Solution

§Performance Analysis

§Expected Improvement

Structs§

Enums§

Constants§

Traits§

Functions§

Module vectorized_scan