Expand description
SIMD-Accelerated Vectorized Scan Engine (Recommendation 2)
§Problem
Current scan implementation is row-at-a-time:
ⓘ
for entry in self.data.iter() { // DashMap iteration - pointer chasing
if let Some(v) = entry.value().read_at(snapshot_ts, current_txn_id)
&& let Some(value) = &v.value // Option unwrapping
{
results.push((key.clone(), value.clone())); // Heap allocs
}
}This has:
- DashMap iterator overhead (~20ns per entry)
- MVCC version chain traversal per row
- Clone allocations in hot path
§Solution
Vectorized execution: process 1024+ rows in a batch, amortizing overhead.
§Performance Analysis
Vectorized execution model:
- Batch size B = 1024 rows
- Per-batch overhead: O(1) iterator setup + O(B) SIMD operations
- Amortized cost:
(setup + B×simd_op) / B ≈ simd_opfor large B
SIMD comparison (AVX-512):
Scalar: 100 cycles × 1000 rows = 100,000 cycles
SIMD: 100 cycles × (1000/16) = 6,250 cycles (16x speedup)Memory bandwidth calculation:
- Current: 300 bytes/row × 1M rows = 300 MB, random access
- Proposed: 60 bytes/row × 1M rows = 60 MB, sequential scan
- RAM bandwidth: ~50 GB/s → theoretical 833M rows/sec
§Expected Improvement
10-20x scan throughput improvement
Structs§
- Int64
Comparison - Comparison predicate for i64 columns
- Simd
Visibility Filter - SIMD-accelerated MVCC visibility filter
- SoaBatch
- Structure of Arrays (SoA) batch for optimal SIMD visibility filtering
- SoaScan
Iterator - High-performance SoA scan iterator with SIMD + late materialization
- SoaScan
Stats - Statistics for SoA scan performance monitoring
- Streaming
Scan Iterator - Streaming scan iterator with SIMD visibility filtering
- Vector
Batch - A batch of rows for vectorized processing
- Vectorized
Scan Config - Configuration for vectorized scans
- Vectorized
Scan Stats - Statistics for vectorized scan operations
- Versioned
Slice - Versioned slice for zero-copy access
Enums§
- Column
Vector - Typed column vector for SIMD-friendly access
- Comparison
Op - Comparison operators
- Value
Handle - Handle for late value materialization
Constants§
- DEFAULT_
BATCH_ SIZE - Default batch size for vectorized operations 1024 rows fits comfortably in L2 cache (~256KB with 256-byte rows)
- MAX_
BATCH_ SIZE - Maximum batch size (above this, memory pressure increases)
- MIN_
BATCH_ SIZE - Minimum batch size (below this, scalar is faster)
Traits§
- SoaSource
- Trait for sources that provide SoA batches
- Vector
Predicate - Predicate for vectorized filtering