probability-rs

A small, dependency-free Rust library for probability distributions focused on numerical clarity, clean APIs, and reproducible random sampling.

Current scope:

Internal RNGs (non-cryptographic): SplitMix64, Xoroshiro128++, Xoshiro256**, PCG32
Traits: Distribution, Continuous, Discrete, Moments
Distributions:
- Continuous: Uniform, Normal, Exponential, Lognormal, Gamma, Beta, Chi-squared
- Discrete: Bernoulli, Poisson, Geometric, Binomial

Why

No external dependencies
Deterministic sampling (seeded), useful for tests and teaching
Simple and explicit math with careful domains and parameter checks

Status

This is a work-in-progress library. APIs may evolve. Contributions and feedback are welcome.

Quick start

Add to your workspace as a path dependency or use locally:

# Cargo.toml
[dependencies]
probability-rs = { path = "./probability-rs" }

Example: sampling and basic queries

use probability_rs::dist::{normal, uniform, exponential, bernoulli, poisson, Distribution, Continuous, Discrete, Moments};
use probability_rs::rng::SplitMix64;

fn main() {
    let normal = normal::Normal::new(0.0, 1.0).unwrap();
    let uniform = uniform::Uniform::new(-1.0, 1.0).unwrap();
    let expo = exponential::Exponential::new(2.0).unwrap();
    let bern = bernoulli::Bernoulli::new(0.4).unwrap();
    let pois = poisson::Poisson::new(3.0).unwrap();

    let mut rng = SplitMix64::seed_from_u64(2024);
    let x_n = normal.sample(&mut rng);
    let x_u = uniform.sample(&mut rng);
    let x_e = expo.sample(&mut rng);
    let x_b = bern.sample(&mut rng);
    let x_p = pois.sample(&mut rng);

    println!("Normal sample: {x_n:.6} pdf(0)={:.6}", normal.pdf(0.0));
    println!("Uniform sample: {x_u:.6} mean={:.3} var={:.3}", uniform.mean(), uniform.variance());
    println!("Exponential sample: {x_e:.6} CDF(1)={:.6}", expo.cdf(1.0));
    println!("Bernoulli sample: {x_b} p=0.4 var={:.3}", bern.variance());
    println!("Poisson sample: {x_p} lambda=3 pmf(3)={:.6}", pois.pmf(3));
}

Run tests:

cargo test --all

API at a glance

Distribution (common):
- cdf(x) -> f64, in_support(x) -> bool, sample(&mut Rng) -> Value
Continuous (f64): pdf(x) -> f64, inv_cdf(p) -> f64
Discrete (i64): pmf(k) -> f64, inv_cdf(p) -> i64
Moments: mean() -> f64, variance() -> f64, skewness() -> f64, kurtosis() -> f64 (excess), kurtosis_full() -> f64
RNG: rng::RngCore, rng::SplitMix64

RNGs: picking the right generator

This crate ships a few small, non-cryptographic PRNGs with a common trait rng::RngCore.

SplitMix64
- Best for: seeding other RNGs, quick-and-simple deterministic tests.
- Pros: tiny, very fast, good bit diffusion; great seed expander.
- Cons: not the strongest statistical quality for long streams compared to xoshiro/pcg.
- Use:
- use probability_rs::rng::SplitMix64;
  - let mut rng = SplitMix64::seed_from_u64(123);
Xoroshiro128++
- Best for: fast simulations with small memory footprint (128-bit state).
- Pros: excellent speed, good quality in practice for 64-bit outputs.
- Cons: period 2^128−1; for massive parallel use, consider jump/long_jump to split streams.
- Use:
- use probability_rs::rng::Xoroshiro128PlusPlus;
  - let mut rng = Xoroshiro128PlusPlus::seed_from_u64(123);
Xoshiro256**
- Best for: general-purpose high-quality streams (256-bit state).
- Pros: period 2^256−1, excellent statistical properties, jump/long_jump available.
- Cons: slightly larger state than Xoroshiro128++.
- Use:
- use probability_rs::rng::xoshiro256::Xoshiro256StarStar;
  - let mut rng = Xoshiro256StarStar::seed_from_u64(123);
PCG32 (XSH RR 64/32)
- Best for: small-state RNG with good 32-bit outputs, reproducible parallel streams.
- Pros: configurable streams via from_seed_and_stream(seed, stream); great distribution.
- Cons: 32-bit output per step (we combine two for 64-bit).
- Use:
- use probability_rs::rng::Pcg32;
  - let mut rng = Pcg32::seed_from_u64(123);
  - or let mut rng = Pcg32::from_seed_and_stream(STATE, STREAM_ID);

Guidelines by scenario:

Reproducible tests, quick examples: SplitMix64
High-throughput simulations (low memory): Xoroshiro128++
High-quality general-purpose streams: Xoshiro256**
Many independent parallel streams with small state: PCG32 (use different stream)

Note: none of these RNGs are cryptographic. For security-sensitive contexts, use a proper CSPRNG.

Numerical notes

Normal CDF/quantile use classic approximations (erf and Acklam’s probit). Tolerances in tests reflect expected approximation error.
Poisson sampling uses a hybrid approach (inversion, mode-based, and quantile-anchored) depending on λ. PTRS may be added later for λ≫1.

Benchmarks

We use Criterion for micro-benchmarks. To run:

cargo bench

The included benchmark compares Poisson sampling for small (λ=2.5) and large (λ=250) regimes.

Roadmap

Distributions and structure
- More distributions
- Truncation and affine transforms (shift/scale) as generic wrappers
- Mixture models (finite mixtures) with EM fitting
Inference and model assessment
- Parameter estimation: MLE/MOM with uncertainty (Fisher information)
- Model selection: AIC/BIC, automated “best fit” among candidates
- Goodness-of-fit tests: Kolmogorov–Smirnov, Anderson–Darling, chi-squared
- Robust statistics and empirical quantiles with confidence intervals
Advanced sampling and performance
- Faster samplers: Ziggurat or Ratio-of-Uniforms (Normal/Exponential), PTRS for Poisson (λ ≫ 1)
- Alias method (Walker/Vose) for arbitrary categorical distributions
- Variance reduction: antithetic variates, control variates, stratification
- Vectorization/batching (std::simd where feasible), allocation-free sample_n and sample_iter
Dependence and multivariate
- Copulas (Gaussian, Student-t) to construct multivariate dependencies
- Multivariate families: Multivariate Normal, Wishart/Inverse-Wishart, Dirichlet
Stochastic processes and simulation
- Poisson processes (homogeneous/inhomogeneous), renewal processes, simple Hawkes
- Brownian motion, Ornstein–Uhlenbeck; SDE discretizations (Euler–Maruyama)
- Time-series generators: AR(1), light ARMA components for simulations
Practical statistics and summaries
- Histograms, KDE, ECDF, descriptive summaries (median, MAD, etc.)
- Streaming quantiles (P² algorithm, optional t-digest via feature flag)
- Distances/divergences: KL, Jensen–Shannon, Wasserstein (1D)
API ergonomics and safety
- logpdf/logpmf/logcdf/logccdf for numerical stability; ccdf for tail work
- Additional moments: entropy, skewness, kurtosis, cumulants
- SeedableRng-style helper trait; domain types (Probability, Positive, Interval)
- Feature flags: serde, no_std (where viable), simd, special-fns
Numerics and special functions
- Special functions: gamma/incomplete gamma, beta/incomplete beta, digamma/trigamma
- Generic numerical inversion for CDFs (bracketing + Newton/Halley) with tolerances
- Tail-accuracy improvements using log1p/expm1 and complemented functions
Tooling and quality
- Expanded benchmarks (Criterion) and lightweight statistical test harness
- CI with lint/test/bench sanity; performance tracking
- Rich documentation with runnable examples and optional notebooks

License

MIT

probability-rs 0.1.2