llkv 0.8.0-alpha

Columnar Storage Layer for Key-Value Stores
Documentation

LLKV: Arrow-Native SQL over Key-Value Storage

made-with-rust CodSpeed Badge Ask DeepWiki

Work in Progress

llkv is the primary entrypoint crate for the LLKV database toolkit. It provides both a command-line interface and a library that re-exports high-level APIs from the underlying workspace crates.

Arrow column chunks are stored under pager-managed physical keys, so pagers that already offer zero-copy reads—like the SIMD-backed simd-r-drive—can return contiguous buffers that stay SIMD-friendly.

Design Notes

  • Execution stays synchronous by default. Query workloads rely on hot-path parallelism from Rayon and Crossbeam instead of a global async runtime so scheduler costs stay predictable, while still embedding cleanly inside Tokio—the SLT harness uses Tokio to stage concurrent connection simulations.
  • Storage is built atop the simd-r-drive project rather than Parquet, prioritizing fast point updates over integration with existing data lake tooling.
  • The crate shares Apache DataFusion's SQL parser and Arrow memory model but intentionally avoids Tokio. The goal is to explore how far a DataFusion-like stack can go without a task scheduler while still shipping MVCC transactions and sqllogictest coverage.
  • LLKV remains alpha software. DataFusion offers a broader production footprint today, while this project trails new ideas for specialized workloads.

Command-Line Interface

The llkv binary supports three modes:

Interactive REPL

Start an interactive session with an in-memory database:

cargo run -p llkv

Available commands:

  • .help — Show usage information
  • .open FILE — Open a persistent database file (planned)
  • .exit or .quit — Exit the REPL
  • SQL statements are executed directly

Stream Processing

Pipe SQL scripts to stdin for batch execution:

echo "SELECT 42 AS answer" | cargo run -p llkv

SQL Logic Test Runner

Execute SLT test files or directories:

# Run a single test
cargo run -p llkv -- --slt tests/slt/example.slt

# Run a test directory
cargo run -p llkv -- --slt tests/slt/

# Use multi-threaded runtime
cargo run -p llkv -- --slt tests/slt/ --slt-runtime multi

Library Usage

The llkv crate re-exports the core SQL engine and storage abstractions for programmatic use.

Quick Start

use std::sync::Arc;
use llkv::{SqlEngine, storage::MemPager};

// Create an in-memory SQL engine
let engine = SqlEngine::new(Arc::new(MemPager::default()));

// Execute SQL statements
let results = engine.execute("CREATE TABLE users (id INT, name TEXT)").unwrap();
let results = engine.execute("INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob')").unwrap();

// Query data
let batches = engine.sql("SELECT * FROM users WHERE id > 0").unwrap();

Re-exported APIs

  • SqlEngine — Main SQL execution engine from llkv-sql
  • storage::MemPager — In-memory pager for transient databases
  • storage::Pager — Trait for custom storage backends
  • RuntimeStatementResult — Statement execution result types
  • Error / Result — Error handling types

Using Persistent Storage

For persistent databases, enable the simd-r-drive-support feature and use a file-backed pager:

[dependencies]
llkv = { version = "0.8.0-alpha", features = ["simd-r-drive-support"] }
use std::sync::Arc;
use llkv::{SqlEngine, storage::SimdRDrivePager};

let pager = SimdRDrivePager::open("database.llkv").unwrap();
let engine = SqlEngine::new(Arc::new(pager));

Architecture

LLKV is organized as a layered workspace with each crate focused on a specific responsibility:

  • SQL Interface (llkv-sql) — Parses SQL and manages the SqlEngine API
  • Planning (llkv-plan, llkv-expr) — Logical plans and expression ASTs
  • Runtime (llkv-runtime, llkv-transaction) — Transaction orchestration and MVCC
  • Execution (llkv-executor, llkv-aggregate, llkv-join) — Query evaluation and streaming results
  • Storage (llkv-table, llkv-column-map, llkv-storage) — Columnar storage and pager abstractions

See the workspace root README and DeepWiki documentation for detailed architecture information.

License

Licensed under the Apache-2.0 License.