shuflr (library)
Library crate for shuflr — streaming shuffled JSONL.
This crate holds the engine: chunk pool, shuffle algorithms (passthrough,
buffer, chunk-shuffled, index-perm, reservoir), I/O (pread + streaming
decoders + zstd-seekable reader/writer + parallel writer), index builder,
analyzer, and the optional service edge (HTTP + shuflr-wire/1 + future
gRPC, all behind feature flags).
The CLI binary lives in the shuflr-cli
crate; the Python client lives in
shuflr-client.
Library use
use io;
use ;
use SeekableReader;
let reader = open?;
let cfg = ChunkShuffledConfig ;
let stats = chunk_shuffled?;
# Ok::
Sinks accept impl Write. Stdout is treated as the data channel — library
code never println!s; logging goes through tracing to stderr.
Features
| Feature | Adds |
|---|---|
zstd |
zstd streaming input + seekable-zstd reader/writer/parallel |
gzip |
streaming gzip input |
bzip2, xz |
additional streaming-input codecs |
parquet |
parquet + HuggingFace Hub input |
serve |
HTTP/1.1 NDJSON listener (rustls TLS, bearer/mTLS auth) |
grpc |
gRPC listener (PR-35) |
prom, otlp |
metrics export |
uring |
Linux io_uring fast path |
Design
docs/design/002-revised-plan.md (in the parent repo) is the v1
authoritative spec. Amendments: 003 (compression), 004 (convert +
seekable invariants), 005 (serve transports).
License
MIT OR Apache-2.0.