Expand description
Columnar storage engine: the immutable, on-disk representation of materialized columns plus the read-time machinery (compute kernels, predicates, scans, selection vectors, snapshots) the engine uses to query them. This crate owns the bucket layout, the per-column compression and encoding schemes, and the registry that tracks which columns are present and at what version.
Read paths come in here, get a column reader, and stream values through compute kernels that operate directly on the encoded bytes where possible - decoding only when a kernel cannot run on the encoded form. The snapshot type is what the subscription tier hands out to consumers so they can iterate over a stable view of the column without racing against ongoing writes.
Invariant: a column’s encoded bytes plus its stats and bitmap are produced together and never updated piecewise. Tearing those apart - rewriting just the values, just the bitmap, or just the stats - means readers can observe a column whose statistics no longer describe its contents, which silently corrupts every kernel that reads stats to skip work.
Modules§
- bucket
- compress
- compute
- Compute kernels that operate on encoded columns. Compare, take, slice, filter, sum, search-sorted, min/max - the primitives the engine VM dispatches to when it executes the per-instruction work of a query. Kernels prefer to run directly on the encoded bytes (canonical layout, dictionary indices, run-length runs) and only decode when they cannot.
- encoding
- Per-column encoding implementations. Canonical is the dense unencoded layout; the compressed family covers all-none, bit-packed, constant, delta, delta-RLE, dictionary, frame-of-reference, run-length, and sparse forms. Each encoding produces and consumes the same encoded-bytes contract so compute kernels can be written once and work across encodings.
- error
- predicate
- reader
- registry
- scan
- selection
- snapshot