Expand description
LLKV: Arrow-Native SQL over Key-Value Storage
This crate serves as the primary entrypoint for the LLKV database toolkit.
It re-exports the high-level SQL engine and storage abstractions from the
underlying llkv-* crates so downstream applications see a single surface
for planning, execution, and storage.
§Why LLKV Exists
LLKV explores what an Apache Arrow-first SQL stack can look like when layered
on top of key-value pagers instead of a purpose-built storage format. The
project targets columnar OLAP workloads while insisting on compatibility with
the reference SQLite sqllogictest suite. Today every published SQLite test
runs unmodified against LLKV, giving the engine a broad SQL regression net as
new features land.
LLKV also ingests a growing set of DuckDB sqllogictest cases. Those tests help
keep transaction semantics honest as dual-dialect support takes shape, but the
DuckDB integration is early and still expanding.
§Story So Far
LLKV is an experimental SQL database that layers Arrow columnar storage, a streaming execution
engine, and MVCC transaction management on top of key-value pagers. Every crate in this workspace
serves that goal, keeping arrow::record_batch::RecordBatch as the interchange format from
storage through execution.
The surface begins with llkv-sql, which parses statements via
sqlparser and lowers them into execution plans. Those plans feed
into llkv-runtime, the orchestration layer that injects MVCC
metadata, coordinates transactions, and dispatches work across the execution and storage stacks.
Query evaluation lives in llkv-executor, which streams Arrow
RecordBatch results without owning MVCC state, while llkv-table
enforces schema rules and logical field tracking on top of the column store.
At the storage layer, llkv-column-map persists column chunks as
Arrow-serialized blobs keyed by pager-managed physical IDs. That layout lets backends such as
simd-r-drive provide zero-copy buffers, and it keeps higher layers working against Arrow data
structures end-to-end. Concurrency remains synchronous by default, leaning on Rayon and
Crossbeam while still embedding inside Tokio when async orchestration is required.
Compatibility is measured continuously. llkv_slt_tester
executes SQLite and DuckDB sqllogictest suites via the
sqllogictest crate, CI spans Linux, macOS, and Windows,
and the
documentation here stays synchronized with the rest of the repository so the rustdoc narrative
matches the public design record.
§Crate Topology
LLKV ships as a layered Cargo workspace where higher crates depend on the ones below while sharing Arrow as their interchange format. The main strata are:
- SQL Interface:
llkv-sqlexposes the SQL entry point and delegates planning and execution. - Query Planning:
llkv-planandllkv-exprdefine logical plans and expression ASTs. - Runtime & Transactions:
llkv-runtimeorchestrates sessions and MVCC, whilellkv-transactionmanages transaction IDs and snapshot isolation. - Query Execution:
llkv-executorstreams ArrowRecordBatchresults with help fromllkv-aggregateandllkv-join. - Table & Metadata:
llkv-tableenforces schemas and catalogs atopllkv-column-map, the columnar storage engine. - Storage & I/O:
llkv-storageprovides the pager abstraction and integrates withsimd-r-drivebackends. - Supporting crates:
llkv-resultunifies error handling,llkv-csvhandles CSV ingestion,llkv-test-utilssupplies testing helpers, andllkv-slt-testerdrives SQL logic tests.
§MVCC Snapshot Isolation
LLKV tracks visibility with MVCC metadata injected into every table: hidden row_id,
created_by, and deleted_by columns are managed by the runtime and storage layers. Transactions
obtain 64-bit IDs from the llkv_transaction stack, capture a
snapshot of the last committed transaction, and tag new or modified rows accordingly. Rows are
visible when created_by is at or below the snapshot watermark and deleted_by is absent or
greater than that watermark. UPDATE and DELETE use soft deletes (deleted_by = txn_id), so
old versions remain until future compaction work lands, and auto-commit statements reuse a fast
path that tags rows with the reserved auto-commit ID.
Each transaction operates with both a base context (existing tables) and a staging context (new tables created during the transaction). On commit, staged operations replay into the base pager once the transaction watermark advances, preserving snapshot isolation without copying entire tables during the unit of work.
§Roadmap Signals
Active work centers on extending the transaction lifecycle (the
TxnIdManager still carries TODOs for next-ID
management), expanding the constraint system across primary, unique, foreign-key, and check
metadata, and tightening performance around Arrow batch sizing and columnar access patterns. The
crates in this workspace continue to evolve together, keeping documentation and implementation in
lockstep.
§Dialect and Tooling Outlook
- SQLite compatibility: LLKV parses SQLite-flavored SQL, batches
INSERTstatements for throughput, and surfaces results in Arrow form. Passing the upstreamsqllogictestsuites establishes a baseline but does not yet make LLKV a drop-in SQLite replacement. - DuckDB coverage: Early DuckDB suites exercise MVCC and typed transaction flows. They chart the roadmap rather than guarantee full DuckDB parity today.
- Tokio-friendly, synchronous core: Queries execute synchronously by default, delegating concurrency to Rayon and Crossbeam. Embedders can still tuck the engine inside Tokio, which is how the SQL Logic Test runner drives concurrent sessions.
See dev-docs/high-level-crate-linkage.md and the DeepWiki documentation for diagrams and extended commentary.
§Quick Start
Create an in-memory SQL engine and execute queries:
use std::sync::Arc;
use llkv::{SqlEngine, storage::MemPager};
let engine = SqlEngine::new(Arc::new(MemPager::default()));
let results = engine.execute("SELECT 42 AS answer").unwrap();§Architecture
LLKV is organized as a layered workspace:
- SQL Interface (llkv-sql): Parses and executes SQL statements.
- Query Planning (llkv-plan, llkv-expr): Defines logical plans and expression ASTs.
- Runtime (llkv-runtime, llkv-transaction): Coordinates MVCC transactions and statement execution.
- Execution (llkv-executor, llkv-aggregate, llkv-join): Streams Arrow batches through operators.
- Storage (llkv-table, llkv-column-map, llkv-storage): Manages columnar storage and pager abstractions.
§Re-exports
This crate re-exports the following modules for convenient access:
Modules§
- storage
- Storage layer abstractions and pager implementations.
Structs§
Enums§
- Error
- Unified error type for all LLKV operations.
- Runtime
Statement Result
Type Aliases§
- Result
- Result type alias used throughout LLKV.