# probability-rs
A small, dependency-free Rust library for probability distributions focused on numerical clarity, clean APIs, and reproducible random sampling.
Current scope:
- Internal RNG: SplitMix64 (non-cryptographic)
- Traits: `Distribution`, `Continuous`, `Discrete`, `Moments`
- Distributions:
- Continuous: Uniform, Normal, Exponential
- Discrete: Bernoulli, Poisson
## Why
- No external dependencies
- Deterministic sampling (seeded), useful for tests and teaching
- Simple and explicit math with careful domains and parameter checks
## Status
This is a work-in-progress library. APIs may evolve. Contributions and feedback are welcome.
## Quick start
Add to your workspace as a path dependency or use locally:
```toml
# Cargo.toml
[dependencies]
probability-rs = { path = "./probability-rs" }
```
Example: sampling and basic queries
```rust
use probability_rs::dist::{normal, uniform, exponential, bernoulli, poisson, Distribution, Continuous, Discrete, Moments};
use probability_rs::rng::SplitMix64;
fn main() {
let normal = normal::Normal::new(0.0, 1.0).unwrap();
let uniform = uniform::Uniform::new(-1.0, 1.0).unwrap();
let expo = exponential::Exponential::new(2.0).unwrap();
let bern = bernoulli::Bernoulli::new(0.4).unwrap();
let pois = poisson::Poisson::new(3.0).unwrap();
let mut rng = SplitMix64::seed_from_u64(2024);
let x_n = normal.sample(&mut rng);
let x_u = uniform.sample(&mut rng);
let x_e = expo.sample(&mut rng);
let x_b = bern.sample(&mut rng);
let x_p = pois.sample(&mut rng);
println!("Normal sample: {x_n:.6} pdf(0)={:.6}", normal.pdf(0.0));
println!("Uniform sample: {x_u:.6} mean={:.3} var={:.3}", uniform.mean(), uniform.variance());
println!("Exponential sample: {x_e:.6} CDF(1)={:.6}", expo.cdf(1.0));
println!("Bernoulli sample: {x_b} p=0.4 var={:.3}", bern.variance());
println!("Poisson sample: {x_p} lambda=3 pmf(3)={:.6}", pois.pmf(3));
}
```
Run tests:
```bash
cargo test --all
```
## API at a glance
- `Distribution` (common):
- `cdf(x) -> f64`, `in_support(x) -> bool`, `sample(&mut Rng) -> Value`
- `Continuous` (f64): `pdf(x) -> f64`, `inv_cdf(p) -> f64`
- `Discrete` (i64): `pmf(k) -> f64`, `inv_cdf(p) -> i64`
- `Moments`: `mean() -> f64`, `variance() -> f64`
- RNG: `rng::RngCore`, `rng::SplitMix64`
## RNGs: picking the right generator
This crate ships a few small, non-cryptographic PRNGs with a common trait `rng::RngCore`.
- SplitMix64
- Best for: seeding other RNGs, quick-and-simple deterministic tests.
- Pros: tiny, very fast, good bit diffusion; great seed expander.
- Cons: not the strongest statistical quality for long streams compared to xoshiro/pcg.
- Use:
- `use probability_rs::rng::SplitMix64;`
- `let mut rng = SplitMix64::seed_from_u64(123);`
- Xoroshiro128++
- Best for: fast simulations with small memory footprint (128-bit state).
- Pros: excellent speed, good quality in practice for 64-bit outputs.
- Cons: period 2^128−1; for massive parallel use, consider jump/long_jump to split streams.
- Use:
- `use probability_rs::rng::Xoroshiro128PlusPlus;`
- `let mut rng = Xoroshiro128PlusPlus::seed_from_u64(123);`
- Xoshiro256**
- Best for: general-purpose high-quality streams (256-bit state).
- Pros: period 2^256−1, excellent statistical properties, jump/long_jump available.
- Cons: slightly larger state than Xoroshiro128++.
- Use:
- `use probability_rs::rng::xoshiro256::Xoshiro256StarStar;`
- `let mut rng = Xoshiro256StarStar::seed_from_u64(123);`
- PCG32 (XSH RR 64/32)
- Best for: small-state RNG with good 32-bit outputs, reproducible parallel streams.
- Pros: configurable streams via `from_seed_and_stream(seed, stream)`; great distribution.
- Cons: 32-bit output per step (we combine two for 64-bit).
- Use:
- `use probability_rs::rng::Pcg32;`
- `let mut rng = Pcg32::seed_from_u64(123);`
- or `let mut rng = Pcg32::from_seed_and_stream(STATE, STREAM_ID);`
Guidelines by scenario:
- Reproducible tests, quick examples: SplitMix64
- High-throughput simulations (low memory): Xoroshiro128++
- High-quality general-purpose streams: Xoshiro256**
- Many independent parallel streams with small state: PCG32 (use different `stream`)
Note: none of these RNGs are cryptographic. For security-sensitive contexts, use a proper CSPRNG.
## Numerical notes
- Normal CDF/quantile use classic approximations (erf and Acklam’s probit). Tolerances in tests reflect expected approximation error.
- Poisson sampling uses a hybrid approach (inversion, mode-based, and quantile-anchored) depending on λ. PTRS may be added later for λ≫1.
## Benchmarks
We use Criterion for micro-benchmarks. To run:
```bash
cargo bench
```
The included benchmark compares Poisson sampling for small (λ=2.5) and large (λ=250) regimes.
## Roadmap
- Distributions and structure
- More distributions: Gamma, Beta, Binomial, Geometric, Lognormal, Chi-squared, Dirichlet, Multivariate Normal
- Truncation and affine transforms (shift/scale) as generic wrappers
- Mixture models (finite mixtures) with EM fitting
- Inference and model assessment
- Parameter estimation: MLE/MOM with uncertainty (Fisher information)
- Model selection: AIC/BIC, automated “best fit” among candidates
- Goodness-of-fit tests: Kolmogorov–Smirnov, Anderson–Darling, chi-squared
- Robust statistics and empirical quantiles with confidence intervals
- Advanced sampling and performance
- Faster samplers: Ziggurat or Ratio-of-Uniforms (Normal/Exponential), PTRS for Poisson (λ ≫ 1)
- Alias method (Walker/Vose) for arbitrary categorical distributions
- Variance reduction: antithetic variates, control variates, stratification
- Vectorization/batching (std::simd where feasible), allocation-free sample_n and sample_iter
- Dependence and multivariate
- Copulas (Gaussian, Student-t) to construct multivariate dependencies
- Multivariate families: Multivariate Normal, Wishart/Inverse-Wishart, Dirichlet
- Stochastic processes and simulation
- Poisson processes (homogeneous/inhomogeneous), renewal processes, simple Hawkes
- Brownian motion, Ornstein–Uhlenbeck; SDE discretizations (Euler–Maruyama)
- Time-series generators: AR(1), light ARMA components for simulations
- Practical statistics and summaries
- Histograms, KDE, ECDF, descriptive summaries (median, MAD, etc.)
- Streaming quantiles (P² algorithm, optional t-digest via feature flag)
- Distances/divergences: KL, Jensen–Shannon, Wasserstein (1D)
- API ergonomics and safety
- logpdf/logpmf/logcdf/logccdf for numerical stability; ccdf for tail work
- Additional moments: entropy, skewness, kurtosis, cumulants
- SeedableRng-style helper trait; domain types (Probability, Positive, Interval)
- Feature flags: `serde`, `no_std` (where viable), `simd`, `special-fns`
- Numerics and special functions
- Special functions: gamma/incomplete gamma, beta/incomplete beta, digamma/trigamma
- Generic numerical inversion for CDFs (bracketing + Newton/Halley) with tolerances
- Tail-accuracy improvements using log1p/expm1 and complemented functions
- Tooling and quality
- Expanded benchmarks (Criterion) and lightweight statistical test harness
- CI with lint/test/bench sanity; performance tracking
- Rich documentation with runnable examples and optional notebooks
## License
MIT