kuji 0.1.7

Stochastic sampling primitives: Gumbel-Softmax, reservoir sampling, and latent permutations
Documentation

kuji

crates.io Documentation CI

Stochastic sampling primitives for unbiased data selection and stream processing. Implements Gumbel-max top-k, Gumbel-Softmax relaxations, reservoir sampling (Algorithm L/R), and weighted reservoir (A-Res).

Dual-licensed under MIT or Apache-2.0.

crates.io | docs.rs

Quickstart

[dependencies]
kuji = "0.1.5"
use kuji::reservoir::ReservoirSampler;

let mut sampler = ReservoirSampler::new(5);
for i in 0..100 {
    sampler.add(i);
}
let samples = sampler.samples();
assert_eq!(samples.len(), 5);

Operations

Function / Type Description
gumbel_max_sample Categorical sample via Gumbel-max trick
gumbel_topk_sample Top-k without replacement via Gumbel perturbation
gumbel_softmax Differentiable categorical approximation
relaxed_topk_gumbel Relaxed k-hot via iterated Gumbel-Softmax
ReservoirSampler Algorithm L (Li, 1994) -- O(k(1 + log(N/k)))
ReservoirSamplerR Algorithm R (Vitter, 1985) -- O(N) baseline
WeightedReservoirSampler A-Res (Efraimidis & Spirakis, 2006)
NeighborSampler Graph neighborhood sampling (with/without replacement)

vs rand: rand::seq provides uniform sampling but not reservoir sampling over streams, weighted reservoir (A-Res), or Gumbel-max top-k.

Examples

  • cargo run --example distribution_demo: ASCII histograms showing uniform vs weighted sampling distributions.
  • cargo run --example weighted_topk: compare Gumbel-top-k (Plackett--Luce) vs weighted reservoir (A-Res) on the same weight vector.
  • cargo run --example streaming_reservoir: stream 1M items through a reservoir of size 100 and verify uniformity.

Output example

cargo run --example distribution_demo:

Reservoir sampling (Algorithm L) -- uniform stream, k=10:
  [ 0.. 9] 10057 #######################################
  [10..19]  9860 ######################################
  [20..29] 10007 #######################################
  [30..39] 10021 #######################################
  [40..49]  9822 ######################################
  [50..59] 10164 ########################################
  [60..69] 10111 #######################################
  [70..79]  9898 ######################################
  [80..89] 10098 #######################################
  [90..99]  9962 #######################################

Weighted reservoir (A-Res) -- power-law weights w(i)=1/(1+i)^1.5:
  [ 0.. 9] 60176 ########################################
  [10..19] 15999 ##########
  [20..29]  7687 #####
  [30..39]  4760 ###
  [40..49]  3124 ##
  [50..59]  2415 #
  [60..69]  1925 #
  [70..79]  1560 #
  [80..89]  1252
  [90..99]  1102

Gumbel-top-k single draw (k=10, seed=42): [1, 0, 12, 82, 3, 11, 2, 7, 8, 49]

Tests

cargo test -p kuji

Performance

Benchmark throughput

Apple Silicon (NEON). Run cargo bench to reproduce on your hardware.

References (what these implementations are trying to be faithful to)

  • Vitter (1985): reservoir sampling "Algorithm R".
  • Li (1994): reservoir sampling "Algorithm L" (skip-based; reduces RNG calls).
  • Efraimidis & Spirakis (2006): weighted reservoir sampling (A-Res / A-ExpJ family).
  • Gumbel-max trick: classical extreme value sampling identity (often cited via modern ML papers):
    • Jang, Gu, Poole (2017): Categorical Reparameterization with Gumbel-Softmax.
    • Maddison, Mnih, Teh (2017): The Concrete Distribution.

License

MIT OR Apache-2.0