rs-stats
A comprehensive statistical library written in Rust, designed for data scientists, researchers and engineers who need reliable, production-grade statistics.
rs-stats covers the full statistical pipeline: probability functions, 14 parametric distributions with MLE/MOM fitting, automatic distribution detection, Kolmogorov-Smirnov goodness-of-fit tests, hypothesis testing, and regression analysis — all panic-free via StatsResult<T>.
Table of Contents
- Key Features
- Installation
- Quick Start — Medical Example
- Distributions
- Automatic Distribution Fitting
- Hypothesis Testing
- Regression Analysis
- Error Handling
- Documentation
✨ Key Features
- 14 parametric distributions — continuous and discrete, each with
fit()(MLE or MOM), PDF/PMF, CDF, quantile, mean, variance, AIC, BIC - Unified trait interface —
DistributionandDiscreteDistributiontraits enable polymorphism andBox<dyn Distribution>at runtime - Auto-fit API — detect data type, fit all candidates, rank by AIC/BIC/KS-test in a single call
- Kolmogorov-Smirnov goodness-of-fit — continuous and discrete variants
- Special functions —
ln_gamma, regularized incomplete gamma and beta (Lanczos + Numerical Recipes) - Hypothesis testing — t-tests (one-sample, two-sample, paired), ANOVA, chi-square, chi-square independence
- Regression — linear, multiple linear, decision trees (regression and classification)
- Panic-free — every computation returns
StatsResult<T>, ready for production
Installation
[]
= "2.0.2"
Or:
Quick Start — Medical Example
Scenario: You receive anonymised blood pressure measurements from 1 200 patients in a hypertension study. You want to identify the best-fitting distribution, compute the probability of a dangerously high reading, and compare two treatment arms.
use ;
use Normal;
use two_sample_t_test;
// ── Step 1: systolic blood pressure data (mmHg) ──────────────────────────────
// Real study would have 1 200 values; small sample used for illustration
let systolic_bp = vec!;
// ── Step 2: auto-detect the distribution and find the best fit ────────────────
let best = auto_fit?;
println!; // → Normal
println!;
println!;
// ── Step 3: use the fitted Normal to answer clinical questions ────────────────
let bp_dist = fit?;
println!;
// P(BP > 140 mmHg) — hypertensive threshold
let p_hyper = 1.0 - bp_dist.cdf?;
println!;
// 95th percentile — what value do 95% of patients fall below?
let p95 = bp_dist.inverse_cdf?;
println!;
// ── Step 4: compare two treatment arms ───────────────────────────────────────
let control_arm = vec!;
let treatment_arm = vec!;
let t_result = two_sample_t_test?;
println!;
if t_result.p_value < 0.05
Distributions
Continuous Distributions
All continuous distributions implement the Distribution trait and expose:
Dist::new(params)— validated constructorDist::fit(data)— maximum likelihood (MLE) or method-of-moments (MOM) estimation.pdf(x),.logpdf(x),.cdf(x),.inverse_cdf(p)— core functions.mean(),.variance(),.std_dev()— moments.aic(data),.bic(data)— model selection criteria
Normal — distributions::normal_distribution::Normal
When to use: Symmetric continuous measurements that cluster around a mean. Medical examples: Blood pressure, height, weight in large populations, IQ scores, measurement errors in lab instruments.
use Normal;
use Distribution;
// Diastolic blood pressure in a healthy cohort: N(80, 8)
let bp = new?;
// P(diastolic BP > 90 mmHg) — stage 1 hypertension threshold
let p_high = 1.0 - bp.cdf?;
println!; // ≈ 10.6%
// 97.5th percentile — upper reference range
let upper_ref = bp.inverse_cdf?;
println!; // ≈ 95.7
// Fit to patient data (MLE: μ̂ = mean, σ̂ = pop std-dev)
let measurements = vec!;
let fitted = fit?;
println!;
Log-Normal — distributions::lognormal::LogNormal
When to use: Right-skewed positive data — concentrations, durations, biological assays. Medical examples: CRP (C-reactive protein) levels, serum creatinine, drug plasma concentrations, tumour volumes, hospital length-of-stay.
use LogNormal;
use Distribution;
// CRP levels (mg/L) in an outpatient cohort
// CRP is log-normally distributed: healthy < 5, elevated 5–100, critical > 100
let crp_data = vec!;
let crp = fit?;
println!;
// Median CRP (more informative than mean for skewed data)
let median = crp.inverse_cdf?;
println!;
// P(CRP > 10 mg/L) — significant inflammation threshold
let p_inflamed = 1.0 - crp.cdf?;
println!;
Weibull — distributions::weibull::Weibull
When to use: Time-to-event data where the hazard rate changes over time. Medical examples: Time to relapse after cancer treatment, medical device/implant survival, time until a drug loses efficacy, organ transplant survival.
use Weibull;
use Distribution;
// Time to relapse (months) after chemotherapy — k > 1 means increasing hazard
let relapse_times = vec!;
let w = fit?;
println!;
// k > 1 → hazard rate increases over time (survivors become more at risk)
// Median relapse-free survival
let median_rfs = w.inverse_cdf?;
println!;
// P(relapse within 6 months) — short-term risk
let p_6mo = w.cdf?;
println!;
// 1-year survival probability
let p_1yr = 1.0 - w.cdf?;
println!;
Gamma — distributions::gamma_distribution::Gamma
When to use: Positive right-skewed data, especially waiting times or accumulated effects. Medical examples: ICU length-of-stay, time between hospital readmissions, blood glucose AUC in OGTT.
use Gamma;
use Distribution;
// ICU length-of-stay (days) — Gamma naturally models positive skewed durations
let icu_los = vec!;
let gamma = fit?;
println!;
println!;
println!;
// P(LOS > 7 days) — prolonged ICU stay threshold for resource planning
let p_prolonged = 1.0 - gamma.cdf?;
println!;
Beta — distributions::beta::Beta
When to use: Proportions, rates, and probabilities bounded in (0, 1). Medical examples: Diagnostic test sensitivity and specificity, medication adherence rates, tumour response rates, proportion of time in therapeutic range (TTR) for anticoagulant patients.
use Beta;
use Distribution;
// Time-in-therapeutic-range (TTR) for warfarin patients (values in 0–1)
// TTR ≥ 0.70 is considered well-controlled anticoagulation
let ttr_data = vec!;
let beta = fit?;
println!;
// P(TTR ≥ 0.70) — probability of being well-controlled
let p_well = 1.0 - beta.cdf?;
println!;
// Median TTR across the population
let median_ttr = beta.inverse_cdf?;
println!;
Student's t — distributions::student_t::StudentT
When to use: Symmetric distributions with heavier tails than Normal; small-sample inference. Medical examples: Standardised effect sizes in small pilot studies, residuals from mixed-effects models, computing critical values for paired-samples tests.
use student_tStudentT;
use Distribution;
// Small diabetes pilot study (n=12): t-distribution with df = n-1 = 11
let t_dist = new?;
// Two-sided critical value at α = 0.05
let t_crit = t_dist.inverse_cdf?;
println!; // ≈ 2.201
// p-value for an observed t-statistic of 2.5 (two-tailed)
let p_value = 2.0 * ;
println!;
Exponential — distributions::exponential_distribution::Exponential
When to use: Time between events when events occur at a constant rate (memoryless property). Medical examples: Inter-arrival times in an emergency department, time between seizures in epilepsy patients, spontaneous adverse events during a trial.
use Exponential;
use Distribution;
// Time (minutes) between patient arrivals in an ED
let inter_arrivals = vec!;
let exp = fit?;
println!;
println!;
// P(next patient within 5 minutes) — triage planning
let p_5min = exp.cdf?;
println!;
Chi-Squared — distributions::chi_squared::ChiSquared
When to use: Distribution of sums of squared standard normals; used in goodness-of-fit tests and variance confidence intervals. Medical examples: Testing whether observed disease frequencies match expected proportions, variance confidence intervals for measurement devices.
use ChiSquared;
use Distribution;
// 6-category goodness-of-fit test: df = 6 - 1 = 5
let chi2 = new?;
// Critical value at α = 0.05
let chi2_crit = chi2.inverse_cdf?;
println!; // ≈ 11.07
// p-value for an observed χ² = 9.2
let p_value = 1.0 - chi2.cdf?;
println!;
F-Distribution — distributions::f_distribution::FDistribution
When to use: Ratio of two chi-squared variables; used in ANOVA and regression significance tests. Medical examples: Comparing biomarker variance across patient groups, multi-arm ANOVA F-statistic, F-test in multiple regression predicting clinical outcomes.
use FDistribution;
use Distribution;
// ANOVA with 4 groups, total n=52: F(3, 48)
let f_dist = new?;
// Critical value at α = 0.05
let f_crit = f_dist.inverse_cdf?;
println!; // ≈ 2.80
// p-value for observed F = 4.5
let p_value = 1.0 - f_dist.cdf?;
println!;
Uniform — distributions::uniform_distribution::Uniform
When to use: All values in a range are equally likely. Medical examples: Randomisation checks in clinical trials, uncertainty about a drug's effective window, boundary-condition stress testing.
use Uniform;
use Distribution;
// Drug release window: effective between 2 h and 6 h post-ingestion
let release = new?;
// P(effective within the first 3 hours)
let p_3h = release.cdf?;
println!; // 25%
Discrete Distributions
All discrete distributions implement the DiscreteDistribution trait and expose:
Dist::new(params)— validated constructorDist::fit(data)— MLE or MOM from&[f64].pmf(k),.logpmf(k),.cdf(k)— core functions.mean(),.variance(),.std_dev()— moments.aic(data),.bic(data)— model selection
Poisson — distributions::poisson_distribution::Poisson
When to use: Count of rare independent events in a fixed time or space window. Medical examples: Adverse drug reactions per 1 000 prescriptions, surgical site infections per month, emergency calls per hour, mutations per cell division.
use Poisson;
use DiscreteDistribution;
// Hospital-acquired infections (HAI) per ward per month: λ = 2.3
let hai = new?;
println!; // ≈ 10.0%
println!; // alert threshold
// Fit from 12 months of observed counts
let monthly_counts = vec!;
let fitted = fit?;
println!;
Binomial — distributions::binomial_distribution::Binomial
When to use: Number of successes in n independent trials with constant probability p. Medical examples: Responders in a treatment cohort, positive tests in a screening batch, side-effect events in a treated group.
use Binomial;
use DiscreteDistribution;
// Trial: n=100 patients, literature response rate p=0.35
let trial = new?;
println!; // 35
// P(≥ 45 responders) — detect a meaningful improvement
let p_improved = 1.0 - trial.cdf?;
println!;
Geometric — distributions::geometric::Geometric
When to use: Number of trials until the first success (k ≥ 1). Medical examples: Screening cycles until a lesion is detected, treatment attempts until remission, needle passes until a successful lumbar puncture.
use Geometric;
use DiscreteDistribution;
// Colonoscopy screening: P(detecting polyp per session) = 0.18
let screening = new?;
println!; // ≈ 5.6
let p_within_3 = screening.cdf?;
println!;
Negative Binomial — distributions::negative_binomial::NegativeBinomial
When to use: Overdispersed count data (variance > mean), or number of failures before r-th success. Medical examples: Hospitalisations before stable remission, overdispersed adverse event counts, recurrences before sustained response.
use NegativeBinomial;
use DiscreteDistribution;
// Re-admissions before stable remission — overdispersed (variance > mean)
let admissions = vec!;
let nb = fit?;
println!;
println!;
println!;
Automatic Distribution Fitting
Scenario: A pharmacokineticist wants to know which distribution best describes drug half-life across 80 patients, without assuming Normality.
use ;
// Drug half-life (hours) — typically log-normal or Weibull in PK studies
let half_lives = vec!;
// One-call: auto-detect type + best AIC
let best = auto_fit?;
println!;
// Full ranking — compare all candidates
println!;
println!;
for r in fit_all?
// Typical output:
// Distribution AIC BIC KS p-value
// -----------------------------------------
// LogNormal 82.34 84.12 0.8231
// Gamma 83.71 85.49 0.7654
// Weibull 84.02 85.80 0.7412
// Normal 89.45 91.23 0.4103
Available candidates
| Type | Distributions |
|---|---|
Continuous (fit_all) |
Normal, Exponential, Uniform, Gamma, LogNormal, Weibull, Beta, StudentT, F, ChiSquared |
Discrete (fit_all_discrete) |
Poisson, Geometric, NegativeBinomial, Binomial |
Hypothesis Testing
Scenario: A clinical trial compares HbA1c reduction across three diabetes treatments.
use ;
// ── Paired t-test: before vs after treatment ──────────────────────────────────
let before = vec!; // HbA1c %
let after = vec!;
let paired = paired_t_test?;
println!;
// p < 0.05 → significant reduction in HbA1c
// ── One-way ANOVA: compare three treatment arms ───────────────────────────────
let drug_a = vec!; // HbA1c change %
let drug_b = vec!;
let drug_c = vec!;
let groups: = vec!;
let anova = one_way_anova?;
println!;
// ── Chi-square independence: side-effect rate by treatment ────────────────────
// Rows: Drug A, B, C | Cols: No side-effect, Side-effect occurred
let observed = vec!;
let = chi_square_independence?;
println!;
Regression Analysis
Scenario: Predict post-operative recovery time from patient characteristics.
use LinearRegression;
use MultipleLinearRegression;
// ── Simple linear regression: age → recovery time ────────────────────────────
let age = vec!;
let recovery_days = vec!;
let mut model = new;
model.fit?;
println!;
// Predict for a 52-year-old patient with 95% CI
let predicted = model.predict;
let = model.confidence_interval?;
println!;
// ── Multiple regression: age + BMI + comorbidity score → recovery ─────────────
let features = vec!;
let outcomes = vec!;
let mut mlr = new;
mlr.fit?;
println!;
Error Handling
All functions return StatsResult<T> — a type alias for Result<T, StatsError>. The library never panics.
use ;
use Normal;
use Distribution;
match reference_range
Error Variants
| Variant | Raised when |
|---|---|
InvalidInput |
Out-of-domain parameter (negative σ, p ∉ [0,1], …) |
EmptyData |
Empty slice passed to fit() or statistical functions |
DimensionMismatch |
Mismatched array lengths (regression, paired tests) |
ConversionError |
Type conversion failures |
NumericalError |
Numerical instability (overflow, NaN propagation) |
NotFitted |
predict() called before fit() on a regression model |
Documentation
Contributing
- Fork the repository
- Create a branch:
git checkout -b feat/my-feature - Commit:
git commit -m "feat(scope): description" - Push and open a pull request
All PRs must pass cargo test, cargo clippy -- -D warnings, and cargo fmt --check.
License
MIT — see LICENSE.