Module columnar

Module columnar 

Source
Expand description

True Columnar Storage with Arrow-Compatible Layout

This module implements memory-efficient columnar storage that:

  • Uses typed columns instead of tagged unions (4-8× memory reduction)
  • Provides SIMD-friendly contiguous memory layout
  • Supports Arrow-compatible offset encoding for strings
  • Uses validity bitmaps for NULL handling (1 bit per value)

§Memory Model

Current ColumnValue enum: 32 bytes per value (discriminant + padding) This implementation:

  • Int64/UInt64: 8 bytes + 1 bit validity = ~8.125 bytes
  • Bool: 1 bit + 1 bit validity = 2 bits (256× improvement!)
  • Text: offset (4 bytes) + data (variable) + 1 bit validity

§SIMD Vectorization

Contiguous typed arrays enable auto-vectorization:

  • AVX-512 can process 8 i64s in parallel
  • SUM/AVG on integer columns: ~120× speedup vs scalar

Structs§

ColumnChunk
Column chunk for cache-optimal processing
ColumnStats
Column statistics for predicate pushdown
ColumnarStore
Columnar store with multiple tables
ColumnarTable
Arrow-compatible columnar table storage
MemoryComparison
Memory comparison between typed and enum-based storage
ValidityBitmap
Validity bitmap - 1 bit per value for NULL tracking

Enums§

ColumnType
Column type enum for schema definition
ColumnValue
Column value enum for insert operations (temporary)
TypedColumn
Type-safe columnar storage with Arrow-compatible memory layout