Skip to main content

Crate shuflr

Crate shuflr 

Source
Expand description

shuflr — streaming shuffled JSONL.

See docs/design/002-revised-plan.md for the authoritative v1 specification, plus amendments 003-compression-formats.md, 004-convert-subcommand.md, and 005-serve-multi-transport.md.

Re-exports§

pub use error::Error;
pub use error::Result;
pub use framing::OnError;
pub use framing::Stats;
pub use index::Fingerprint;
pub use index::IndexFile;
pub use sampling::SamplingReader;
pub use seed::Seed;

Modules§

analyze
shuflr analyze — detect source-order locality in a seekable-zstd file that would make --shuffle=chunk-shuffled a bad choice (ML review 02 §1, 002 §6.4).
error
Library-wide error type, per 002 §10.3.
framing
Record-framing primitives: how bytes become lines, and what to do on mishaps.
index
.shuflr-idx byte-offset index for --shuffle=index-perm (002 §2.2).
io
Input sources.
json_validate
Minimal JSON syntactic validator used by shuflr verify --deep.
pipeline
Engine pipelines. Each module here is a complete shuffle mode or orchestrated flow. v1 modes arrive in the order documented by 002 §2 and 004 §9.
sampling
Record-level sampling transforms that wrap any Read and re-expose a Read with filtered contents. Three orthogonal modes, composable:
seed
PRF hierarchy rooted at a master seed (002 §3).

Functions§

physical_cores
Physical CPU core count (not logical/SMT). Defaults to 1 on systems where detection fails. Preferred over std::thread::available_parallelism for compute-heavy workloads like zstd compression; see docs/bench/001-edgar-31gb-gzip.md §thread-scaling.