# samkhya-core
[](https://crates.io/crates/samkhya-core)
[](https://docs.rs/samkhya-core)
[](https://github.com/singhpratech/samkhya/blob/main/LICENSE)
The foundational crate of the [samkhya](https://github.com/singhpratech/samkhya)
project — portable, feedback-driven cardinality correction for embedded
analytical engines.
`samkhya-core` is engine-agnostic. It contains every primitive that the
per-engine adapters (`samkhya-datafusion`, `samkhya-duckdb-ext`,
`samkhya-polars`, `samkhya-postgres`, `samkhya-py`) build on top of: the four
foundational sketches, the 2D correlated histogram, the Puffin sidecar
reader/writer, the LpBound envelope family, the feedback store, and the
residual corrector trait. Nothing here links to a specific query engine.
## What this crate provides
```
samkhya-core
├── sketch
│ ├── HllSketch HyperLogLog (distinct count)
│ ├── BloomFilter membership
│ ├── CountMinSketch point frequency
│ ├── EquiDepthHistogram 1D range
│ └── CorrelatedHistogram2D 2D joint distribution
├── puffin
│ ├── PuffinReader / PuffinWriter Iceberg Puffin v1 sidecars
│ └── KIND tags samkhya.hll-v1, .bloom-v1, .cms-v1, ...
├── lpbound
│ ├── ProductBound coarse n_1 * n_2 * ... ceiling
│ ├── AgmBound AGM fractional edge cover
│ ├── ChainBound chain-join specialisation
│ └── LpJoinBound LP-derived (feature `lp_solver`)
├── feedback
│ ├── FeedbackStore SQLite-backed observation log
│ └── TemplateHash query-template fingerprint
└── corrector
├── IdentityCorrector no-op (passes the clamped ceiling)
├── GbtCorrector gradient-boosted residual
├── AdditiveGbtCorrector per-template additive residual
└── (trait Corrector) implement your own
```
## Quick start
```rust
use samkhya_core::sketch::HllSketch;
let mut hll = HllSketch::new(14);
for i in 0..10_000u64 {
hll.add(&i.to_le_bytes());
}
let estimate = hll.estimate();
assert!((estimate as i64 - 10_000).abs() < 200); // ~0.5% rel err at p=14
let bytes = hll.to_bytes();
let restored = HllSketch::from_bytes(&bytes).unwrap();
assert_eq!(restored.estimate(), estimate);
```
A larger end-to-end example — building four sketches, writing them to a
Puffin sidecar, and reading them back — is at
[`examples/sketch_to_puffin.rs`](examples/sketch_to_puffin.rs).
## Cardinality envelope
Every corrector output in samkhya is clamped above by a provable ceiling
derived from sketch-level statistics. The four envelopes form a strict
ordering on tightness:
```
LpJoinBound <= AgmBound <= ChainBound <= ProductBound
(tightest) (loosest)
```
`LpJoinBound` requires the `lp_solver` feature (pulls in `good_lp` +
`microlp`). The default build ships `ProductBound`/`AgmBound`/`ChainBound`
without any LP dependency.
## Feature flags
| `lp_solver` | off | `LpJoinBound` via `good_lp` + `microlp` |
| `gbt` | on | `GbtCorrector` via `gbdt` |
| `tabpfn_http` | off | `TabPfnHttpCorrector` (foundation-model HTTP backend) |
| `iceberg_compat` | on | Puffin sidecar reader strictness for Iceberg payloads |
Disabling `gbt` removes the only ML dep; pure-sketch deployments can do that.
## Safety / format stability
All `from_bytes` constructors take untrusted input and are in-scope for the
project's SECURITY.md. They are fuzzed (`cargo fuzz`) on every release and
must never panic on adversarial bytes — they return `Err` instead.
Sketch payload codecs and the Puffin KIND tags are pinned at v1 for the
v1.x line. Format bumps will use new `kind`s (`samkhya.hll-v2`, …) and the
reader's coexistence contract: unknown kinds are skipped, never errored.
## Integration
`samkhya-core` is the only crate the engine adapters depend on. If you're
embedding samkhya into a new engine, start by depending on this crate and
mirroring the integration pattern in `samkhya-datafusion`.
## License
Apache-2.0. Sole author: Prateek Singh.