cqlite-core 0.11.0

Core engine for CQLite — read Apache Cassandra 5.0 SSTables locally without a cluster
Documentation

Status: v0.11.0 — Core reading, CLI, output writers, Python & Node.js bindings, and write support (with STCS compaction) are production-ready. See CHANGELOG.md.

CQLite provides SQLite-like local access to Apache Cassandra SSTables, enabling developers to read Cassandra 5.0+ data files without cluster dependencies. Built in Rust for performance and safety.

Documentation

Full documentation is at https://pmcfadin.github.io/cqlite/:

Section URL
User Docs — install, quick start, CLI, Python, Node.js /cqlite/user-docs/
SSTable Format Guide — binary format deep-dive /cqlite/sstable-format/
For Agents: Using CQLite — LLM/agent integration /cqlite/agents-using/
For Agents: Developing CQLite — contributor doctrine, gate contract /cqlite/agents-developing/

Vision

CQLite aims to become the standard tool for Cassandra SSTable manipulation outside of the main Apache Cassandra project, enabling new workflows for data analytics, migration, testing, and edge computing.

Project Leadership

CQLite is designed by Patrick McFadin, Apache Cassandra PMC member with 13 years of Cassandra experience. The project embodies Apache Cassandra community values and will be donated to the Apache Cassandra project upon maturity.

Install

CLI (from crates.io — requires Rust 1.85+)

cargo install cqlite-cli      # installs the `cqlite` binary
cqlite --help

CLI (prebuilt binaries — no Rust toolchain required)

Each GitHub release attaches a prebuilt cqlite CLI binary for the common platforms, each with a .sha256 checksum sidecar:

Platform Asset
macOS (Apple Silicon) cqlite-aarch64-apple-darwin.tar.gz
macOS (Intel) cqlite-x86_64-apple-darwin.tar.gz
Linux x86_64 (glibc) cqlite-x86_64-unknown-linux-gnu.tar.gz
Linux x86_64 (static musl) cqlite-x86_64-unknown-linux-musl.tar.gz
Linux arm64 (glibc) cqlite-aarch64-unknown-linux-gnu.tar.gz
Windows x86_64 cqlite-x86_64-pc-windows-gnu.zip
# Example: macOS Apple Silicon
TARGET=aarch64-apple-darwin
curl -fsSLO https://github.com/pmcfadin/cqlite/releases/latest/download/cqlite-$TARGET.tar.gz
curl -fsSLO https://github.com/pmcfadin/cqlite/releases/latest/download/cqlite-$TARGET.tar.gz.sha256
shasum -a 256 -c cqlite-$TARGET.tar.gz.sha256   # verify (use sha256sum -c on Linux)
tar xzf cqlite-$TARGET.tar.gz
./cqlite --help

Rust library

cargo add cqlite-core         # use cqlite-core as a dependency

See Using cqlite-core as a dependency and the API docs.

Language bindings

pip install cqlite-py        # Python
npm install @cqlite/node     # Node.js

Quick Start

# Clone the repository
git clone https://github.com/pmcfadin/cqlite.git
cd cqlite

# Build the project
cargo build --release

# Run the CLI tool
cargo run --package cqlite-cli -- \
  --schema test-data/schemas/basic-types.cql \
  --data-dir test-data/datasets/sstables \
  --query "SELECT * FROM test_basic.simple_table LIMIT 5" \
  --out json

Python

pip install cqlite-py
import cqlite

with cqlite.open('path/to/sstables', schema='schema.cql') as db:
    for row in db.execute('SELECT * FROM keyspace.table LIMIT 5'):
        print(row.to_dict())

Node.js

npm install @cqlite/node
import { Database } from '@cqlite/node';

const db = await Database.open('path/to/sstables', { schema: 'schema.cql' });
const result = await db.execute('SELECT * FROM keyspace.table LIMIT 5');
for (const row of result.rows) {
  console.log(row.name);
}
await db.close();

Write Support

CQLite v0.9.0 (M5) ships write support across all interfaces: Rust core, Python, Node.js, and CLI. Written data flushes to portable Cassandra 5.0 SSTables that Cassandra can read directly via nodetool refresh.

The schema file below is included in the repository at test-data/schemas/write-test.cql.

Python

import cqlite

# Open in writable mode — write_dir stores the WAL and flushed SSTables
with cqlite.open(
    'test-data/datasets/sstables',
    schema='test-data/schemas/write-test.cql',
    writable=True,
    write_dir='/tmp/my-writes',
) as db:
    db.execute(
        "INSERT INTO test_basic.simple_table (id, name, age) "
        "VALUES (11111111-1111-1111-1111-111111111111, 'Alice', 30)"
    )
    path = db.flush_run()
    print(f'Flushed SSTable: {path}')

Node.js

const { Database } = require('@cqlite/node');

const db = await Database.open('test-data/datasets/sstables', {
  schema: 'test-data/schemas/write-test.cql',
  writable: true,
  writeDir: '/tmp/my-writes',
});
await db.execute(
  "INSERT INTO test_basic.simple_table (id, name, age) " +
  "VALUES (22222222-2222-2222-2222-222222222222, 'Bob', 25)"
);
const path = await db.flushRun();
console.log('Flushed SSTable:', path);
await db.close();

CLI

# Build with write support
cargo build --package cqlite-cli --features write-support

# Write via CQL INSERT
cargo run --package cqlite-cli --features write-support -- \
  --writable --write-dir /tmp/my-writes \
  --schema test-data/schemas/write-test.cql \
  --execute "INSERT INTO test_basic.simple_table (id, name, age) \
             VALUES (33333333-3333-3333-3333-333333333333, 'Carol', 28)"

# Flush memtable to SSTable
cargo run --package cqlite-cli --features write-support -- \
  --writable --write-dir /tmp/my-writes \
  --schema test-data/schemas/write-test.cql \
  --flush

See docs/write-support.md for the full write guide, including the Cassandra export workflow and known limitations. To embed cqlite-core in your own Rust project (dependency line, feature flags, and a compiling write example), see docs/using-cqlite-core-as-a-dependency.md.

Feature Flags

cqlite-core gates optional functionality behind Cargo features. The table below maps the public API you're likely to reach for to the feature that enables it.

Want… Enable feature In defaults?
Read / query path (Database::open, execute, scan, get) state_machine ✅ yes
Compression (LZ4 / Snappy / Deflate / Zstd) all-compression ✅ yes
Write path (WriteEngine, Mutation, WriteEngine::write/flush) write-support ✅ yes
Database::flush / Database::compact (high-level convenience) experimental ❌ opt-in
CLI ingestion / REPL helpers (cqlite-cli) cli-helpers ❌ opt-in
Performance metrics collection metrics ❌ opt-in

Default features are ["all-compression", "state_machine", "write-support"] (see cqlite-core/Cargo.toml). write-support was folded into the defaults in #558 — it gates only first-party code and adds no extra dependencies, so read-only consumers pay nothing for it. flush/compact on the high-level Database type remain behind experimental; the equivalent engine-level WriteEngine::flush is part of write-support.

Building with Custom Features

# Default build (read + write + compression)
cargo build

# Read-only consumer: drop the write path (still zero-cost to keep it, but explicit)
cargo build -p cqlite-core --no-default-features --features all-compression,state_machine

# Opt into high-level Database::flush / compact
cargo build -p cqlite-core --features experimental

# Minimal build (no compression, no query engine)
cargo build -p cqlite-core --no-default-features

Features

✅ Complete (M1/M2)

  • Cassandra 5+ SSTable format parsing (100% of test tables)
  • All CQL types including collections and UDTs
  • All compression codecs (LZ4, Snappy, Deflate, Zstd)
  • CLI tool with REPL and one-shot query modes
  • SELECT with WHERE clause (partition/clustering key equality)
  • Output formats: Table, JSON, CSV

✅ M3 Complete (Jan 2026)

  • Parquet output format with Snappy compression
  • Export command (cqlite export)
  • Streaming export for large datasets
  • Output formats: CSV, JSON, Parquet, CQL

✅ M4 Complete (Jan 2026)

  • Python bindings with full CQL type support
  • Node.js bindings with TypeScript definitions
  • Streaming API for memory-efficient queries
  • pip/npm installable packages (5 platform builds each)
  • Type stubs for IDE support (Python mypy, TypeScript)

✅ M5 Complete — v0.9.0 (May 2026)

  • Write support: WAL + memtable + flush to Cassandra SSTables
  • STCS compaction via maintenance_step()
  • Write API in Python, Node.js, and CLI
  • Full type coverage: Inet, Varint, Duration, Tuple, Frozen
  • E2E readback gate: write → flush → Cassandra nodetool refresh → verify

✅ Since v0.9.0 (v0.10 → v0.11.0, Jun 2026)

  • Embeddable Parquet writer in cqlite-core (behind a parquet feature) + export_parquet in Python/Node
  • Version-gated reads for the Cassandra 5.0 oa format; graceful handling of da (BTI)
  • Real BTI trie node-type dispatch and schema-typed query result columns
  • Published documentation site at pmcfadin.github.io/cqlite
  • See CHANGELOG.md for the full per-release detail

📋 Roadmap

  • M6: WASM bindings for browser deployment
  • M7: Performance validation + v1.0 release

Architecture Highlights

Design Philosophy:

  • No cluster dependency - Read and write SSTables directly, with no running Cassandra node
  • CQL parser - Native CQL support using an Antlr4 grammar
  • Cassandra 5+ focus - Modern 'oa' format with BTI support
  • Memory efficient - <128MB usage target for large files
  • Self-contained engine - Pure-Rust parsing and writing, including STCS compaction

Getting Involved

CQLite is developed in the open as an Apache-licensed project. We welcome contributions from the Cassandra community!

Development Setup

# Prerequisites
# - Rust 1.85+

# Clone and build
git clone https://github.com/pmcfadin/cqlite.git
cd cqlite
cargo build

# Fetch test data (JSONL reference files are in git, SSTable binaries fetched separately)
bash test-data/scripts/fetch-datasets.sh

# Run tests
env CQLITE_DATASETS_ROOT=$PWD/test-data/datasets cargo test --package cqlite-core

Contributing

  1. Check Issues: Look for good-first-issue labels
  2. Discuss: Join our community discussions
  3. Code: Follow Rust best practices and include tests
  4. Test: Ensure compatibility with real Cassandra data
  5. Document: Update docs for user-facing changes

Current Status

✅ M1 Complete (Dec 2025)

  • All SSTable components parsed (Data.db, Index.db, Summary.db, Statistics.db, TOC)
  • 33/33 test tables passing (100% validation)
  • All 21 CQL primitive types + collections + UDTs + frozen types
  • All compression algorithms working
  • Tiered test coverage targets (see PRD Section 5.1)

✅ M2 Complete (Jan 2026)

  • CLI with one-shot and REPL modes
  • SELECT queries with WHERE clause support
  • Multiple output formats (Table, JSON, CSV)

✅ M3 Complete (Jan 2026)

  • Parquet output format with Snappy compression
  • Export command with CSV, JSON, Parquet, CQL formats
  • Streaming export for memory-efficient large dataset handling
  • Progress bar and statistics for exports

✅ M4 Complete (Jan 2026)

  • Python bindings via PyO3 with sync-first API
  • Node.js bindings via napi-rs with Promise-based API
  • Full CQL type system (20+ types including collections, UDTs)
  • Thread-safe database handles
  • 500+ tests with 98%+ pass rate across both bindings

✅ M5 Complete — v0.9.0 (May 2026)

  • Write support: WAL-backed memtable + flush to portable Cassandra 5.0 SSTables
  • STCS compaction (maintenance_step())
  • Write API exposed in Python (flush_run, maintenance_step, write_stats), Node.js (flushRun, maintenanceStep, writeStats), and CLI (--writable, --write-dir, --flush, maintenance, write-stats, export-sstable)
  • Type roundtrips verified for all major types including Inet, Varint, Duration, Tuple, Frozen
  • E2E validation against live Cassandra 5.0 (write → flush → nodetool refreshcqlsh)

See docs/development/PRD.md for milestone details.

Technical Details

Supported Formats

  • Cassandra 5.0+: 'oa' format with BTI support
  • File Types: Data.db, Index.db, Summary.db, Statistics.db
  • Compression: LZ4, Snappy, Deflate, Zstd

Performance Targets

  • Parse Speed: 1GB files in <10 seconds
  • Memory Usage: <128MB for large SSTables
  • Query Latency: Sub-millisecond partition lookups

Language Bindings

Resources

Community

CQLite is an independent open-source project, not an Apache Software Foundation project. It is built in the spirit of the Apache Cassandra community, with the goal of contributing it upstream as it matures.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Acknowledgments

Special thanks to the Apache Cassandra community and the many contributors who make projects like this possible. CQLite builds on decades of database engineering innovation from the Cassandra project.


Note: M1 through M5 milestones are complete and the project is at v0.11.0. Core SSTable reading, CLI, output writers (including Parquet), Python and Node.js bindings, and write support with STCS compaction are production-ready. Next: M6 (WASM bindings) and M7 (performance validation + v1.0).