samkhya-cli 1.0.0

samkhya command-line tools: inspect, stats, sketch, puffin pack/verify
samkhya-cli-1.0.0 is not a library.

samkhya-cli

crates.io docs.rs Apache-2.0

The operator-facing CLI for samkhya. Surfaces the same primitives that samkhya-core exposes to embedded engines — Puffin sidecars, sketches, feedback stores — so operators can debug a production sidecar, inspect a feedback database, or build a sketch from a CSV without writing any Rust.

Part of the samkhya project — portable, feedback-driven cardinality correction for embedded analytical engines.

What this crate provides

A single binary, samkhya, with four top-level subcommands:

samkhya
├── inspect <path>           dump a Puffin sidecar
├── stats <path>             summarize a FeedbackStore SQLite file
├── sketch
│   ├── hll                  HyperLogLog (distinct count)
│   ├── bloom                Bloom filter (membership)
│   ├── cms                  Count-Min sketch (frequency)
│   └── histogram            equi-depth histogram (range)
└── puffin
    ├── pack                 bundle sketch payloads into one .puffin file
    └── verify               full structural validation

Every sketch builder reads a CSV by 0-based column index. Pass --header when the CSV has a header row.

Quick start

# Inspect any Puffin sidecar — footer JSON plus decoded sketch summaries.
samkhya inspect ./stats.puffin

# Build an HLL sketch from column 3 of a CSV.
samkhya sketch hll \
    --input rows.csv \
    --column 3 \
    --precision 14 \
    --header \
    --output col3.hll

# Bundle several sketch payloads into one Puffin sidecar.
samkhya puffin pack stats.puffin \
    --hll col3.hll \
    --bloom col3.bloom \
    --cms col3.cms \
    --histogram col0.hist

# Validate a sidecar end-to-end (footer, every blob, every decoded payload).
samkhya puffin verify stats.puffin

Subcommand reference

inspect <path>

Dump the sidecar's footer (JSON) and decode every blob whose kind matches a known samkhya sketch. Unknown kinds are listed but not decoded — that's the Puffin coexistence contract.

stats <path>

Open a FeedbackStore SQLite file and print total observations, distinct template hashes, latency percentiles, and per-template avg/max q-error.

sketch bloom

samkhya sketch bloom \
    --input rows.csv --column 3 \
    --capacity 1000000 --fp-rate 0.01 \
    --header --output col3.bloom

sketch cms

samkhya sketch cms \
    --input rows.csv --column 3 \
    --depth 5 --width 1024 \
    --header --output col3.cms

sketch histogram

Numeric-only: column cells must parse as f64; empty cells are skipped.

samkhya sketch histogram \
    --input rows.csv --column 0 \
    --buckets 64 \
    --header --output col0.hist

puffin pack

Wrap one or more sketch payload files (produced by samkhya sketch ... --output) into a single Puffin sidecar with the correct KIND tags. Any flag may be repeated to bundle multiple sketches of the same kind. The packer decodes each payload through the matching Sketch::from_bytes before writing, so a corrupt input fails fast.

puffin verify

Full structural validation — parses the footer, reads every blob, and re-decodes any known-kind payload. Exits non-zero on the first failure.

Feature flags

This crate has no cargo features. It depends on samkhya-core and clap; the binary builds with a stock Rust toolchain and links no native libraries beyond what rusqlite already vendors.

Exit codes

  • 0 on success
  • 1 on any operational error (invalid sketch, missing file, decode failure, verify rejection)
  • 2 on CLI usage error (clap-driven)

Integration

The CLI is the operator escape hatch: every primitive an embedded engine adapter uses (sketch construction, Puffin pack/verify, FeedbackStore introspection) is also reachable from the shell. A typical workflow is to build sketches in a nightly ELT batch with samkhya sketch ... --output, bundle them with samkhya puffin pack, then verify the resulting sidecar in CI with samkhya puffin verify.

License

Apache-2.0. Sole author: Prateek Singh.