cqlite-cli 0.11.0

Command-line interface for CQLite โ€” read Apache Cassandra 5.0 SSTables without a cluster
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
<p align="center">
  <img src="website/src/assets/cqlite.png" alt="CQLite" width="480">
</p>

<p align="center"><strong>A high-performance Rust library for local Apache Cassandra SSTable access</strong></p>

<p align="center">
  <a href="https://github.com/pmcfadin/cqlite/actions/workflows/ci.yml"><img src="https://github.com/pmcfadin/cqlite/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://crates.io/crates/cqlite-cli"><img src="https://img.shields.io/crates/v/cqlite-cli.svg?label=crates.io%20cqlite-cli" alt="crates.io"></a>
  <a href="https://docs.rs/cqlite-core"><img src="https://img.shields.io/docsrs/cqlite-core.svg?label=docs.rs" alt="docs.rs"></a>
  <a href="https://pypi.org/project/cqlite-py/"><img src="https://img.shields.io/pypi/v/cqlite-py.svg?label=pypi%20cqlite-py" alt="PyPI"></a>
  <a href="https://www.npmjs.com/package/@cqlite/node"><img src="https://img.shields.io/npm/v/@cqlite/node.svg?label=npm%20%40cqlite%2Fnode" alt="npm"></a>
  <a href="https://pmcfadin.github.io/cqlite/"><img src="https://img.shields.io/badge/docs-pmcfadin.github.io%2Fcqlite-blue.svg" alt="Docs"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="Apache License"></a>
  <a href="https://www.rust-lang.org"><img src="https://img.shields.io/badge/rust-1.85+-red.svg" alt="Rust"></a>
  <a href="https://cassandra.apache.org"><img src="https://img.shields.io/badge/cassandra-5.0+-green.svg" alt="Cassandra"></a>
</p>

> **Status**: v0.11.0 โ€” Core reading, CLI, output writers, Python & Node.js bindings, and write support (with STCS compaction) are production-ready. See [CHANGELOG.md]CHANGELOG.md.

CQLite provides SQLite-like local access to Apache Cassandra SSTables, enabling developers to read Cassandra 5.0+ data files without cluster dependencies. Built in Rust for performance and safety.

## Documentation

Full documentation is at **[https://pmcfadin.github.io/cqlite/](https://pmcfadin.github.io/cqlite/)**:

| Section | URL |
|---------|-----|
| User Docs โ€” install, quick start, CLI, Python, Node.js | [/cqlite/user-docs/]https://pmcfadin.github.io/cqlite/user-docs/ |
| SSTable Format Guide โ€” binary format deep-dive | [/cqlite/sstable-format/]https://pmcfadin.github.io/cqlite/sstable-format/ |
| For Agents: Using CQLite โ€” LLM/agent integration | [/cqlite/agents-using/]https://pmcfadin.github.io/cqlite/agents-using/ |
| For Agents: Developing CQLite โ€” contributor doctrine, gate contract | [/cqlite/agents-developing/]https://pmcfadin.github.io/cqlite/agents-developing/ |

## Vision

CQLite aims to become the standard tool for Cassandra SSTable manipulation outside of the main Apache Cassandra project, enabling new workflows for data analytics, migration, testing, and edge computing.

## Project Leadership

CQLite is designed by **Patrick McFadin**, Apache Cassandra PMC member with 13 years of Cassandra experience. The project embodies Apache Cassandra community values and will be donated to the Apache Cassandra project upon maturity.

## Install

### CLI (from crates.io โ€” requires Rust 1.85+)

```bash
cargo install cqlite-cli      # installs the `cqlite` binary
cqlite --help
```

### CLI (prebuilt binaries โ€” no Rust toolchain required)

Each [GitHub release](https://github.com/pmcfadin/cqlite/releases) attaches a
prebuilt `cqlite` CLI binary for the common platforms, each with a `.sha256`
checksum sidecar:

| Platform | Asset |
|----------|-------|
| macOS (Apple Silicon) | `cqlite-aarch64-apple-darwin.tar.gz` |
| macOS (Intel) | `cqlite-x86_64-apple-darwin.tar.gz` |
| Linux x86_64 (glibc) | `cqlite-x86_64-unknown-linux-gnu.tar.gz` |
| Linux x86_64 (static musl) | `cqlite-x86_64-unknown-linux-musl.tar.gz` |
| Linux arm64 (glibc) | `cqlite-aarch64-unknown-linux-gnu.tar.gz` |
| Windows x86_64 | `cqlite-x86_64-pc-windows-gnu.zip` |

```bash
# Example: macOS Apple Silicon
TARGET=aarch64-apple-darwin
curl -fsSLO https://github.com/pmcfadin/cqlite/releases/latest/download/cqlite-$TARGET.tar.gz
curl -fsSLO https://github.com/pmcfadin/cqlite/releases/latest/download/cqlite-$TARGET.tar.gz.sha256
shasum -a 256 -c cqlite-$TARGET.tar.gz.sha256   # verify (use sha256sum -c on Linux)
tar xzf cqlite-$TARGET.tar.gz
./cqlite --help
```

### Rust library

```bash
cargo add cqlite-core         # use cqlite-core as a dependency
```

See [Using cqlite-core as a dependency](docs/using-cqlite-core-as-a-dependency.md) and the [API docs](https://docs.rs/cqlite-core).

### Language bindings

```bash
pip install cqlite-py        # Python
npm install @cqlite/node     # Node.js
```

## Quick Start

```bash
# Clone the repository
git clone https://github.com/pmcfadin/cqlite.git
cd cqlite

# Build the project
cargo build --release

# Run the CLI tool
cargo run --package cqlite-cli -- \
  --schema test-data/schemas/basic-types.cql \
  --data-dir test-data/datasets/sstables \
  --query "SELECT * FROM test_basic.simple_table LIMIT 5" \
  --out json
```

### Python

```bash
pip install cqlite-py
```

```python
import cqlite

with cqlite.open('path/to/sstables', schema='schema.cql') as db:
    for row in db.execute('SELECT * FROM keyspace.table LIMIT 5'):
        print(row.to_dict())
```

### Node.js

```bash
npm install @cqlite/node
```

```typescript
import { Database } from '@cqlite/node';

const db = await Database.open('path/to/sstables', { schema: 'schema.cql' });
const result = await db.execute('SELECT * FROM keyspace.table LIMIT 5');
for (const row of result.rows) {
  console.log(row.name);
}
await db.close();
```

## Write Support

CQLite v0.9.0 (M5) ships write support across all interfaces: Rust core, Python,
Node.js, and CLI. Written data flushes to portable Cassandra 5.0 SSTables that
Cassandra can read directly via `nodetool refresh`.

The schema file below is included in the repository at
`test-data/schemas/write-test.cql`.

### Python

```python
import cqlite

# Open in writable mode โ€” write_dir stores the WAL and flushed SSTables
with cqlite.open(
    'test-data/datasets/sstables',
    schema='test-data/schemas/write-test.cql',
    writable=True,
    write_dir='/tmp/my-writes',
) as db:
    db.execute(
        "INSERT INTO test_basic.simple_table (id, name, age) "
        "VALUES (11111111-1111-1111-1111-111111111111, 'Alice', 30)"
    )
    path = db.flush_run()
    print(f'Flushed SSTable: {path}')
```

### Node.js

```javascript
const { Database } = require('@cqlite/node');

const db = await Database.open('test-data/datasets/sstables', {
  schema: 'test-data/schemas/write-test.cql',
  writable: true,
  writeDir: '/tmp/my-writes',
});
await db.execute(
  "INSERT INTO test_basic.simple_table (id, name, age) " +
  "VALUES (22222222-2222-2222-2222-222222222222, 'Bob', 25)"
);
const path = await db.flushRun();
console.log('Flushed SSTable:', path);
await db.close();
```

### CLI

```bash
# Build with write support
cargo build --package cqlite-cli --features write-support

# Write via CQL INSERT
cargo run --package cqlite-cli --features write-support -- \
  --writable --write-dir /tmp/my-writes \
  --schema test-data/schemas/write-test.cql \
  --execute "INSERT INTO test_basic.simple_table (id, name, age) \
             VALUES (33333333-3333-3333-3333-333333333333, 'Carol', 28)"

# Flush memtable to SSTable
cargo run --package cqlite-cli --features write-support -- \
  --writable --write-dir /tmp/my-writes \
  --schema test-data/schemas/write-test.cql \
  --flush
```

See [docs/write-support.md](docs/write-support.md) for the full write guide,
including the Cassandra export workflow and known limitations. To embed
`cqlite-core` in your own Rust project (dependency line, feature flags, and a
compiling write example), see
[docs/using-cqlite-core-as-a-dependency.md](docs/using-cqlite-core-as-a-dependency.md).

## Feature Flags

`cqlite-core` gates optional functionality behind Cargo features. The table below
maps the public API you're likely to reach for to the feature that enables it.

| Wantโ€ฆ | Enable feature | In defaults? |
|-------|----------------|--------------|
| Read / query path (`Database::open`, `execute`, `scan`, `get`) | `state_machine` | โœ… yes |
| Compression (LZ4 / Snappy / Deflate / Zstd) | `all-compression` | โœ… yes |
| Write path (`WriteEngine`, `Mutation`, `WriteEngine::write`/`flush`) | `write-support` | โœ… yes |
| `Database::flush` / `Database::compact` (high-level convenience) | `experimental` | โŒ opt-in |
| CLI ingestion / REPL helpers (`cqlite-cli`) | `cli-helpers` | โŒ opt-in |
| Performance metrics collection | `metrics` | โŒ opt-in |

Default features are `["all-compression", "state_machine", "write-support"]`
(see `cqlite-core/Cargo.toml`). `write-support` was folded into the defaults in
[#558](https://github.com/pmcfadin/cqlite/issues/558) โ€” it gates only first-party
code and adds **no extra dependencies**, so read-only consumers pay nothing for it.
`flush`/`compact` on the high-level `Database` type remain behind `experimental`;
the equivalent engine-level `WriteEngine::flush` is part of `write-support`.

### Building with Custom Features

```bash
# Default build (read + write + compression)
cargo build

# Read-only consumer: drop the write path (still zero-cost to keep it, but explicit)
cargo build -p cqlite-core --no-default-features --features all-compression,state_machine

# Opt into high-level Database::flush / compact
cargo build -p cqlite-core --features experimental

# Minimal build (no compression, no query engine)
cargo build -p cqlite-core --no-default-features
```

## Features

### โœ… Complete (M1/M2)
- [x] Cassandra 5+ SSTable format parsing (100% of test tables)
- [x] All CQL types including collections and UDTs
- [x] All compression codecs (LZ4, Snappy, Deflate, Zstd)
- [x] CLI tool with REPL and one-shot query modes
- [x] SELECT with WHERE clause (partition/clustering key equality)
- [x] Output formats: Table, JSON, CSV

### โœ… M3 Complete (Jan 2026)
- [x] Parquet output format with Snappy compression
- [x] Export command (`cqlite export`)
- [x] Streaming export for large datasets
- [x] Output formats: CSV, JSON, Parquet, CQL

### โœ… M4 Complete (Jan 2026)
- [x] Python bindings with full CQL type support
- [x] Node.js bindings with TypeScript definitions
- [x] Streaming API for memory-efficient queries
- [x] pip/npm installable packages (5 platform builds each)
- [x] Type stubs for IDE support (Python mypy, TypeScript)

### โœ… M5 Complete โ€” v0.9.0 (May 2026)
- [x] Write support: WAL + memtable + flush to Cassandra SSTables
- [x] STCS compaction via `maintenance_step()`
- [x] Write API in Python, Node.js, and CLI
- [x] Full type coverage: Inet, Varint, Duration, Tuple, Frozen
- [x] E2E readback gate: write โ†’ flush โ†’ Cassandra `nodetool refresh` โ†’ verify

### โœ… Since v0.9.0 (v0.10 โ†’ v0.11.0, Jun 2026)
- [x] Embeddable Parquet writer in `cqlite-core` (behind a `parquet` feature) + `export_parquet` in Python/Node
- [x] Version-gated reads for the Cassandra 5.0 `oa` format; graceful handling of `da` (BTI)
- [x] Real BTI trie node-type dispatch and schema-typed query result columns
- [x] Published documentation site at [pmcfadin.github.io/cqlite]https://pmcfadin.github.io/cqlite/
- See [CHANGELOG.md]CHANGELOG.md for the full per-release detail

### ๐Ÿ“‹ Roadmap
- [ ] M6: WASM bindings for browser deployment
- [ ] M7: Performance validation + v1.0 release

## Architecture Highlights

**Design Philosophy:**
- **No cluster dependency** - Read and write SSTables directly, with no running Cassandra node
- **CQL parser** - Native CQL support using an Antlr4 grammar
- **Cassandra 5+ focus** - Modern 'oa' format with BTI support
- **Memory efficient** - <128MB usage target for large files
- **Self-contained engine** - Pure-Rust parsing and writing, including STCS compaction

## Getting Involved

CQLite is developed in the open as an Apache-licensed project. We welcome contributions from the Cassandra community!

### Development Setup

```bash
# Prerequisites
# - Rust 1.85+

# Clone and build
git clone https://github.com/pmcfadin/cqlite.git
cd cqlite
cargo build

# Fetch test data (JSONL reference files are in git, SSTable binaries fetched separately)
bash test-data/scripts/fetch-datasets.sh

# Run tests
env CQLITE_DATASETS_ROOT=$PWD/test-data/datasets cargo test --package cqlite-core
```

### Contributing

1. **Check Issues**: Look for `good-first-issue` labels
2. **Discuss**: Join our community discussions
3. **Code**: Follow Rust best practices and include tests
4. **Test**: Ensure compatibility with real Cassandra data
5. **Document**: Update docs for user-facing changes

## Current Status

### โœ… M1 Complete (Dec 2025)
- All SSTable components parsed (Data.db, Index.db, Summary.db, Statistics.db, TOC)
- 33/33 test tables passing (100% validation)
- All 21 CQL primitive types + collections + UDTs + frozen types
- All compression algorithms working
- Tiered test coverage targets (see [PRD Section 5.1]docs/development/PRD.md#51--tiered-coverage-targets)

### โœ… M2 Complete (Jan 2026)
- CLI with one-shot and REPL modes
- SELECT queries with WHERE clause support
- Multiple output formats (Table, JSON, CSV)

### โœ… M3 Complete (Jan 2026)
- Parquet output format with Snappy compression
- Export command with CSV, JSON, Parquet, CQL formats
- Streaming export for memory-efficient large dataset handling
- Progress bar and statistics for exports

### โœ… M4 Complete (Jan 2026)
- Python bindings via PyO3 with sync-first API
- Node.js bindings via napi-rs with Promise-based API
- Full CQL type system (20+ types including collections, UDTs)
- Thread-safe database handles
- 500+ tests with 98%+ pass rate across both bindings

### โœ… M5 Complete โ€” v0.9.0 (May 2026)
- Write support: WAL-backed memtable + flush to portable Cassandra 5.0 SSTables
- STCS compaction (`maintenance_step()`)
- Write API exposed in Python (`flush_run`, `maintenance_step`, `write_stats`),
  Node.js (`flushRun`, `maintenanceStep`, `writeStats`), and CLI (`--writable`,
  `--write-dir`, `--flush`, `maintenance`, `write-stats`, `export-sstable`)
- Type roundtrips verified for all major types including Inet, Varint, Duration, Tuple, Frozen
- E2E validation against live Cassandra 5.0 (write โ†’ flush โ†’ `nodetool refresh` โ†’ `cqlsh`)

See [docs/development/PRD.md](docs/development/PRD.md) for milestone details.

## Technical Details

### Supported Formats
- **Cassandra 5.0+**: 'oa' format with BTI support
- **File Types**: Data.db, Index.db, Summary.db, Statistics.db
- **Compression**: LZ4, Snappy, Deflate, Zstd

### Performance Targets
- **Parse Speed**: 1GB files in <10 seconds
- **Memory Usage**: <128MB for large SSTables
- **Query Latency**: Sub-millisecond partition lookups

### Language Bindings
- **Python**: Production-ready sync API (see [Python README]bindings/python/README.md)
- **Node.js**: Production-ready Promise API (see [Node.js README]bindings/node/README.md)
- **WASM**: Planned (M6+)

## Resources

- **Documentation site**: [https://pmcfadin.github.io/cqlite/]https://pmcfadin.github.io/cqlite/ โ€” user docs, SSTable format guide, agent integration docs
- **API docs (rustdoc)**: [latest tag]https://pmcfadin.github.io/cqlite/api/latest/ ยท published per release tag at `https://pmcfadin.github.io/cqlite/api/<tag>/`
- **Changelog**: [CHANGELOG.md]CHANGELOG.md โ€” what each tagged release contains
- **Performance**: [Methodology, local repro, and CI gate policy]docs/performance.md
- **CQL Grammar**: [Patrick's Antlr4 CQL Grammar]https://github.com/pmcfadin/cassandra-antlr4-grammar
- **Issues**: [GitHub Issues]https://github.com/pmcfadin/cqlite/issues
- **Discussions**: [GitHub Discussions]https://github.com/pmcfadin/cqlite/discussions

## Community

- **Questions & ideas**: [GitHub Discussions]https://github.com/pmcfadin/cqlite/discussions
- **Bugs & feature requests**: [GitHub Issues]https://github.com/pmcfadin/cqlite/issues
- **Contributing**: see [CONTRIBUTING.md]CONTRIBUTING.md and our [Code of Conduct]CODE_OF_CONDUCT.md

CQLite is an independent open-source project, not an Apache Software Foundation
project. It is built in the spirit of the Apache Cassandra community, with the
goal of contributing it upstream as it matures.

## License

Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.

## Acknowledgments

Special thanks to the Apache Cassandra community and the many contributors who make projects like this possible. CQLite builds on decades of database engineering innovation from the Cassandra project.

---

**Note**: M1 through M5 milestones are complete and the project is at **v0.11.0**. Core SSTable reading, CLI, output writers (including Parquet), Python and Node.js bindings, and write support with STCS compaction are production-ready. Next: M6 (WASM bindings) and M7 (performance validation + v1.0).