Crate llkv_column_map

Crate llkv_column_map 

Source
Expand description

Columnar storage engine for LLKV.

This crate provides the low-level columnar layer that persists Apache Arrow RecordBatches to disk and supports efficient scans, filters, and updates. It serves as the foundation for llkv-table and higher-level query execution.

§Role in the Story

The column map is where LLKV’s Arrow-first design meets pager-backed persistence. Every sqllogictest shipped with SQLite—and an expanding set of DuckDB suites—ultimately routes through these descriptors and chunk walkers. The storage layer therefore carries the burden of matching SQLite semantics while staying efficient enough for OLAP workloads. Gaps uncovered by the logic tests are treated as defects in this crate, not harness exceptions.

The engine is maintained in the open by a single developer. These docs aim to give newcomers the same context captured in the README and DeepWiki pages so the story remains accessible as the project grows.

§Architecture

The storage engine is organized into several key components:

  • ColumnStore: Primary interface for storing and retrieving columnar data. Manages column descriptors, metadata catalogs, and coordinates with the pager for persistent storage.

  • ScanBuilder: Builder pattern for constructing column scans with various options (filters, ordering, row ID inclusion).

  • Visitor Pattern: Scans emit data through visitor callbacks rather than materializing entire columns in memory, enabling streaming and aggregation.

§Storage Model

Data is stored in columnar chunks:

  • Each column is identified by a LogicalFieldId
  • Columns are broken into chunks for incremental writes
  • Each chunk stores Arrow-serialized data plus metadata (row count, min/max values)
  • Shadow columns track row IDs separately from user data
  • MVCC columns (created_by, deleted_by) track transaction visibility

§Namespaces

Columns are organized into namespaces to prevent ID collisions:

  • UserData: Regular table columns
  • RowIdShadow: Internal row ID tracking for each column
  • TxnCreatedBy: MVCC transaction that created each row
  • TxnDeletedBy: MVCC transaction that deleted each row

§Test Coverage

  • SQLite suites: The storage layer powers every SQLite sqllogictest case that upstream publishes. Passing those suites provides a baseline for SQLite compatibility, but LLKV still diverges from SQLite behavior in places and should not be treated as a drop-in replacement yet.
  • DuckDB extensions: DuckDB-focused suites exercise MVCC edge cases and typed transaction flows. Coverage is early and informs the roadmap rather than proving full DuckDB parity today. All suites run through the sqllogictest crate.

§Thread Safety

ColumnStore is thread-safe (Send + Sync) with internal locking for catalog updates. Read operations can occur concurrently; writes are serialized through the catalog lock.

§Macros and Type Dispatch

This crate provides macros for efficient type-specific operations without runtime dispatch overhead. See with_integer_arrow_type! for details.

Re-exports§

pub use store::ColumnStore;
pub use store::IndexKind;
pub use store::ROW_ID_COLUMN_NAME;
pub use store::scan;
pub use store::scan::ScanBuilder;

Modules§

codecs
Manual little-endian codecs for fixed-width structs.
debug
gather
Row gathering helpers for assembling Arrow arrays across chunks.
parallel
Helper utilities for Rayon thread-pool management.
serialization
Zero-copy array persistence for fixed/var width Arrow arrays used by the store.
store
ColumnStore facade and supporting modules.

Macros§

llkv_for_each_arrow_boolean
llkv_for_each_arrow_numeric
Invokes a macro for each supported Arrow numeric type.
llkv_for_each_arrow_string
with_integer_arrow_type
Dispatches to type-specific code based on an Arrow DataType.

Enums§

Error
Unified error type for all LLKV operations.

Functions§

ensure_supported_arrow_type
is_supported_arrow_type
supported_arrow_types

Type Aliases§

Result
Result type alias used throughout LLKV.