samkhya-cli
The operator-facing CLI for samkhya. Surfaces the same primitives that
samkhya-core exposes to embedded engines — Puffin sidecars, sketches,
feedback stores — so operators can debug a production sidecar, inspect a
feedback database, or build a sketch from a CSV without writing any Rust.
Part of the samkhya project — portable, feedback-driven cardinality correction for embedded analytical engines.
What this crate provides
A single binary, samkhya, with four top-level subcommands:
samkhya
├── inspect <path> dump a Puffin sidecar
├── stats <path> summarize a FeedbackStore SQLite file
├── sketch
│ ├── hll HyperLogLog (distinct count)
│ ├── bloom Bloom filter (membership)
│ ├── cms Count-Min sketch (frequency)
│ └── histogram equi-depth histogram (range)
└── puffin
├── pack bundle sketch payloads into one .puffin file
└── verify full structural validation
Every sketch builder reads a CSV by 0-based column index. Pass --header
when the CSV has a header row.
Quick start
# Inspect any Puffin sidecar — footer JSON plus decoded sketch summaries.
# Build an HLL sketch from column 3 of a CSV.
# Bundle several sketch payloads into one Puffin sidecar.
# Validate a sidecar end-to-end (footer, every blob, every decoded payload).
Subcommand reference
inspect <path>
Dump the sidecar's footer (JSON) and decode every blob whose kind matches
a known samkhya sketch. Unknown kinds are listed but not decoded — that's
the Puffin coexistence contract.
stats <path>
Open a FeedbackStore SQLite file and print total observations, distinct
template hashes, latency percentiles, and per-template avg/max q-error.
sketch bloom
sketch cms
sketch histogram
Numeric-only: column cells must parse as f64; empty cells are skipped.
puffin pack
Wrap one or more sketch payload files (produced by samkhya sketch ... --output) into a single Puffin sidecar with the correct KIND tags. Any
flag may be repeated to bundle multiple sketches of the same kind. The
packer decodes each payload through the matching Sketch::from_bytes
before writing, so a corrupt input fails fast.
puffin verify
Full structural validation — parses the footer, reads every blob, and re-decodes any known-kind payload. Exits non-zero on the first failure.
Feature flags
This crate has no cargo features. It depends on samkhya-core and clap;
the binary builds with a stock Rust toolchain and links no native
libraries beyond what rusqlite already vendors.
Exit codes
0on success1on any operational error (invalid sketch, missing file, decode failure, verify rejection)2on CLI usage error (clap-driven)
Integration
The CLI is the operator escape hatch: every primitive an embedded engine
adapter uses (sketch construction, Puffin pack/verify, FeedbackStore
introspection) is also reachable from the shell. A typical workflow is to
build sketches in a nightly ELT batch with samkhya sketch ... --output,
bundle them with samkhya puffin pack, then verify the resulting sidecar
in CI with samkhya puffin verify.
License
Apache-2.0. Sole author: Prateek Singh.