Skip to main content

Module vectorized_scan

Module vectorized_scan 

Source
Expand description

SIMD-Accelerated Vectorized Scan Engine (Recommendation 2)

§Problem

Current scan implementation is row-at-a-time:

for entry in self.data.iter() {  // DashMap iteration - pointer chasing
    if let Some(v) = entry.value().read_at(snapshot_ts, current_txn_id)
        && let Some(value) = &v.value  // Option unwrapping
    {
        results.push((key.clone(), value.clone())); // Heap allocs
    }
}

This has:

  • DashMap iterator overhead (~20ns per entry)
  • MVCC version chain traversal per row
  • Clone allocations in hot path

§Solution

Vectorized execution: process 1024+ rows in a batch, amortizing overhead.

§Performance Analysis

Vectorized execution model:

  • Batch size B = 1024 rows
  • Per-batch overhead: O(1) iterator setup + O(B) SIMD operations
  • Amortized cost: (setup + B×simd_op) / B ≈ simd_op for large B

SIMD comparison (AVX-512):

Scalar: 100 cycles × 1000 rows = 100,000 cycles
SIMD: 100 cycles × (1000/16) = 6,250 cycles (16x speedup)

Memory bandwidth calculation:

  • Current: 300 bytes/row × 1M rows = 300 MB, random access
  • Proposed: 60 bytes/row × 1M rows = 60 MB, sequential scan
  • RAM bandwidth: ~50 GB/s → theoretical 833M rows/sec

§Expected Improvement

10-20x scan throughput improvement

Structs§

Int64Comparison
Comparison predicate for i64 columns
SimdVisibilityFilter
SIMD-accelerated MVCC visibility filter
SoaBatch
Structure of Arrays (SoA) batch for optimal SIMD visibility filtering
SoaScanIterator
High-performance SoA scan iterator with SIMD + late materialization
SoaScanStats
Statistics for SoA scan performance monitoring
StreamingScanIterator
Streaming scan iterator with SIMD visibility filtering
VectorBatch
A batch of rows for vectorized processing
VectorizedScanConfig
Configuration for vectorized scans
VectorizedScanStats
Statistics for vectorized scan operations
VersionedSlice
Versioned slice for zero-copy access

Enums§

ColumnVector
Typed column vector for SIMD-friendly access
ComparisonOp
Comparison operators
ValueHandle
Handle for late value materialization

Constants§

DEFAULT_BATCH_SIZE
Default batch size for vectorized operations 1024 rows fits comfortably in L2 cache (~256KB with 256-byte rows)
MAX_BATCH_SIZE
Maximum batch size (above this, memory pressure increases)
MIN_BATCH_SIZE
Minimum batch size (below this, scalar is faster)

Traits§

SoaSource
Trait for sources that provide SoA batches
VectorPredicate
Predicate for vectorized filtering

Functions§

simd_compare_i64_gt