rs-stats 2.0.2

Statistics library in rust
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
# rs-stats

[![Rust](https://img.shields.io/badge/rust-1.56%2B-orange.svg)](https://www.rust-lang.org/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Version](https://img.shields.io/badge/version-2.0.2-green.svg)](https://crates.io/crates/rs-stats)
[![Tests](https://img.shields.io/badge/tests-343%20passing-brightgreen.svg)](https://github.com/lsh0x/rs-stats/actions)
[![CI](https://github.com/lsh0x/rs-stats/workflows/CI/badge.svg)](https://github.com/lsh0x/rs-stats/actions)
[![Docs](https://docs.rs/rs-stats/badge.svg)](https://docs.rs/rs-stats)
[![Crates.io](https://img.shields.io/crates/v/rs-stats.svg)](https://crates.io/crates/rs-stats)

A comprehensive statistical library written in Rust, designed for data scientists, researchers and engineers who need reliable, production-grade statistics.

**rs-stats** covers the full statistical pipeline: probability functions, 14 parametric distributions with MLE/MOM fitting, automatic distribution detection, Kolmogorov-Smirnov goodness-of-fit tests, hypothesis testing, and regression analysis — all **panic-free** via `StatsResult<T>`.

---

## Table of Contents

- [Key Features]#-key-features
- [Installation]#installation
- [Quick Start — Medical Example]#quick-start--medical-example
- [Distributions]#distributions
  - [Continuous Distributions]#continuous-distributions
  - [Discrete Distributions]#discrete-distributions
- [Automatic Distribution Fitting]#automatic-distribution-fitting
- [Hypothesis Testing]#hypothesis-testing
- [Regression Analysis]#regression-analysis
- [Error Handling]#error-handling
- [Documentation]#documentation

---

## ✨ Key Features

- **14 parametric distributions** — continuous and discrete, each with `fit()` (MLE or MOM), PDF/PMF, CDF, quantile, mean, variance, AIC, BIC
- **Unified trait interface**`Distribution` and `DiscreteDistribution` traits enable polymorphism and `Box<dyn Distribution>` at runtime
- **Auto-fit API** — detect data type, fit all candidates, rank by AIC/BIC/KS-test in a single call
- **Kolmogorov-Smirnov goodness-of-fit** — continuous and discrete variants
- **Special functions**`ln_gamma`, regularized incomplete gamma and beta (Lanczos + Numerical Recipes)
- **Hypothesis testing** — t-tests (one-sample, two-sample, paired), ANOVA, chi-square, chi-square independence
- **Regression** — linear, multiple linear, decision trees (regression and classification)
- **Panic-free** — every computation returns `StatsResult<T>`, ready for production

---

## Installation

```toml
[dependencies]
rs-stats = "2.0.2"
```

Or:

```bash
cargo add rs-stats
```

---

## Quick Start — Medical Example

> **Scenario**: You receive anonymised blood pressure measurements from 1 200 patients in a hypertension study. You want to identify the best-fitting distribution, compute the probability of a dangerously high reading, and compare two treatment arms.

```rust
use rs_stats::{auto_fit, fit_all, Distribution};
use rs_stats::distributions::normal_distribution::Normal;
use rs_stats::hypothesis_tests::t_test::two_sample_t_test;

// ── Step 1: systolic blood pressure data (mmHg) ──────────────────────────────
// Real study would have 1 200 values; small sample used for illustration
let systolic_bp = vec![
    115.0, 122.0, 118.0, 130.0, 125.0, 119.0, 128.0, 132.0,
    121.0, 117.0, 126.0, 135.0, 123.0, 120.0, 127.0, 131.0,
];

// ── Step 2: auto-detect the distribution and find the best fit ────────────────
let best = auto_fit(&systolic_bp)?;
println!("Best distribution : {}", best.name);   // → Normal
println!("  AIC             : {:.2}", best.aic);
println!("  KS p-value      : {:.4}", best.ks_p_value);

// ── Step 3: use the fitted Normal to answer clinical questions ────────────────
let bp_dist = Normal::fit(&systolic_bp)?;
println!("Fitted Normal(μ={:.1}, σ={:.1})", bp_dist.mean(), bp_dist.std_dev());

// P(BP > 140 mmHg) — hypertensive threshold
let p_hyper = 1.0 - bp_dist.cdf(140.0)?;
println!("P(BP > 140 mmHg) = {:.2}%", p_hyper * 100.0);

// 95th percentile — what value do 95% of patients fall below?
let p95 = bp_dist.inverse_cdf(0.95)?;
println!("95th percentile  = {:.1} mmHg", p95);

// ── Step 4: compare two treatment arms ───────────────────────────────────────
let control_arm   = vec![128.0, 132.0, 125.0, 130.0, 129.0, 131.0, 127.0, 133.0];
let treatment_arm = vec![118.0, 122.0, 115.0, 120.0, 119.0, 121.0, 117.0, 123.0];

let t_result = two_sample_t_test(&control_arm, &treatment_arm, false)?;
println!("Two-sample t-test: t={:.3}, p={:.4}", t_result.t_statistic, t_result.p_value);
if t_result.p_value < 0.05 {
    println!("→ Statistically significant BP reduction (α = 0.05)");
}
```

---

## Distributions

### Continuous Distributions

All continuous distributions implement the `Distribution` trait and expose:
- `Dist::new(params)` — validated constructor
- `Dist::fit(data)` — maximum likelihood (MLE) or method-of-moments (MOM) estimation
- `.pdf(x)`, `.logpdf(x)`, `.cdf(x)`, `.inverse_cdf(p)` — core functions
- `.mean()`, `.variance()`, `.std_dev()` — moments
- `.aic(data)`, `.bic(data)` — model selection criteria

---

#### Normal — `distributions::normal_distribution::Normal`

> **When to use**: Symmetric continuous measurements that cluster around a mean.
> **Medical examples**: Blood pressure, height, weight in large populations, IQ scores, measurement errors in lab instruments.

```rust
use rs_stats::distributions::normal_distribution::Normal;
use rs_stats::Distribution;

// Diastolic blood pressure in a healthy cohort: N(80, 8)
let bp = Normal::new(80.0, 8.0)?;

// P(diastolic BP > 90 mmHg) — stage 1 hypertension threshold
let p_high = 1.0 - bp.cdf(90.0)?;
println!("P(DBP > 90) = {:.1}%", p_high * 100.0);   // ≈ 10.6%

// 97.5th percentile — upper reference range
let upper_ref = bp.inverse_cdf(0.975)?;
println!("Upper reference (97.5th pct) = {:.1} mmHg", upper_ref);  // ≈ 95.7

// Fit to patient data (MLE: μ̂ = mean, σ̂ = pop std-dev)
let measurements = vec![78.0, 82.0, 79.0, 85.0, 81.0, 77.0, 83.0, 80.0];
let fitted = Normal::fit(&measurements)?;
println!("Fitted μ = {:.2}, σ = {:.2}", fitted.mean(), fitted.std_dev());
```

---

#### Log-Normal — `distributions::lognormal::LogNormal`

> **When to use**: Right-skewed positive data — concentrations, durations, biological assays.
> **Medical examples**: CRP (C-reactive protein) levels, serum creatinine, drug plasma concentrations, tumour volumes, hospital length-of-stay.

```rust
use rs_stats::distributions::lognormal::LogNormal;
use rs_stats::Distribution;

// CRP levels (mg/L) in an outpatient cohort
// CRP is log-normally distributed: healthy < 5, elevated 5–100, critical > 100
let crp_data = vec![
    1.2, 0.8, 2.1, 1.5, 45.0, 3.2, 0.9, 12.4, 1.8, 88.0,
    2.4, 1.1, 5.6, 0.7, 22.3, 3.9, 1.3, 9.7,  0.6,  0.5,
];

let crp = LogNormal::fit(&crp_data)?;
println!("LogNormal(μ={:.2}, σ={:.2})", crp.mu, crp.sigma);

// Median CRP (more informative than mean for skewed data)
let median = crp.inverse_cdf(0.5)?;
println!("Median CRP     = {:.2} mg/L", median);

// P(CRP > 10 mg/L) — significant inflammation threshold
let p_inflamed = 1.0 - crp.cdf(10.0)?;
println!("P(CRP > 10)    = {:.1}%", p_inflamed * 100.0);
```

---

#### Weibull — `distributions::weibull::Weibull`

> **When to use**: Time-to-event data where the hazard rate changes over time.
> **Medical examples**: Time to relapse after cancer treatment, medical device/implant survival, time until a drug loses efficacy, organ transplant survival.

```rust
use rs_stats::distributions::weibull::Weibull;
use rs_stats::Distribution;

// Time to relapse (months) after chemotherapy — k > 1 means increasing hazard
let relapse_times = vec![3.1, 7.4, 12.5, 2.8, 18.2, 5.9, 9.6, 15.3, 4.2, 22.0];

let w = Weibull::fit(&relapse_times)?;
println!("Weibull(k={:.2}, λ={:.2})", w.k, w.lambda);
// k > 1 → hazard rate increases over time (survivors become more at risk)

// Median relapse-free survival
let median_rfs = w.inverse_cdf(0.5)?;
println!("Median relapse-free survival = {:.1} months", median_rfs);

// P(relapse within 6 months) — short-term risk
let p_6mo = w.cdf(6.0)?;
println!("P(relapse < 6 months)        = {:.1}%", p_6mo * 100.0);

// 1-year survival probability
let p_1yr = 1.0 - w.cdf(12.0)?;
println!("1-year relapse-free survival = {:.1}%", p_1yr * 100.0);
```

---

#### Gamma — `distributions::gamma_distribution::Gamma`

> **When to use**: Positive right-skewed data, especially waiting times or accumulated effects.
> **Medical examples**: ICU length-of-stay, time between hospital readmissions, blood glucose AUC in OGTT.

```rust
use rs_stats::distributions::gamma_distribution::Gamma;
use rs_stats::Distribution;

// ICU length-of-stay (days) — Gamma naturally models positive skewed durations
let icu_los = vec![
    1.5, 2.0, 4.5, 1.2, 7.8, 3.1, 2.4, 10.2, 1.8, 5.6,
    3.9, 2.1, 6.3, 1.4, 8.9, 4.0, 2.7,  3.5, 1.9, 12.1,
];

let gamma = Gamma::fit(&icu_los)?;
println!("Gamma(α={:.2}, β={:.2})", gamma.alpha, gamma.beta);
println!("Mean ICU stay   = {:.2} days", gamma.mean());
println!("Std-dev         = {:.2} days", gamma.std_dev());

// P(LOS > 7 days) — prolonged ICU stay threshold for resource planning
let p_prolonged = 1.0 - gamma.cdf(7.0)?;
println!("P(LOS > 7 days) = {:.1}%", p_prolonged * 100.0);
```

---

#### Beta — `distributions::beta::Beta`

> **When to use**: Proportions, rates, and probabilities bounded in (0, 1).
> **Medical examples**: Diagnostic test sensitivity and specificity, medication adherence rates, tumour response rates, proportion of time in therapeutic range (TTR) for anticoagulant patients.

```rust
use rs_stats::distributions::beta::Beta;
use rs_stats::Distribution;

// Time-in-therapeutic-range (TTR) for warfarin patients (values in 0–1)
// TTR ≥ 0.70 is considered well-controlled anticoagulation
let ttr_data = vec![
    0.72, 0.65, 0.88, 0.55, 0.91, 0.78, 0.62, 0.84,
    0.70, 0.58, 0.79, 0.93, 0.67, 0.75, 0.48, 0.82,
];

let beta = Beta::fit(&ttr_data)?;
println!("Beta(α={:.2}, β={:.2})", beta.alpha, beta.beta);

// P(TTR ≥ 0.70) — probability of being well-controlled
let p_well = 1.0 - beta.cdf(0.70)?;
println!("P(TTR ≥ 0.70) = {:.1}%", p_well * 100.0);

// Median TTR across the population
let median_ttr = beta.inverse_cdf(0.5)?;
println!("Median TTR    = {:.1}%", median_ttr * 100.0);
```

---

#### Student's t — `distributions::student_t::StudentT`

> **When to use**: Symmetric distributions with heavier tails than Normal; small-sample inference.
> **Medical examples**: Standardised effect sizes in small pilot studies, residuals from mixed-effects models, computing critical values for paired-samples tests.

```rust
use rs_stats::distributions::student_t::StudentT;
use rs_stats::Distribution;

// Small diabetes pilot study (n=12): t-distribution with df = n-1 = 11
let t_dist = StudentT::new(0.0, 1.0, 11.0)?;

// Two-sided critical value at α = 0.05
let t_crit = t_dist.inverse_cdf(0.975)?;
println!("t-critical (α=0.05, df=11) = {:.3}", t_crit);   // ≈ 2.201

// p-value for an observed t-statistic of 2.5 (two-tailed)
let p_value = 2.0 * (1.0 - t_dist.cdf(2.5)?);
println!("p-value for |t|=2.5        = {:.4}", p_value);
```

---

#### Exponential — `distributions::exponential_distribution::Exponential`

> **When to use**: Time between events when events occur at a constant rate (memoryless property).
> **Medical examples**: Inter-arrival times in an emergency department, time between seizures in epilepsy patients, spontaneous adverse events during a trial.

```rust
use rs_stats::distributions::exponential_distribution::Exponential;
use rs_stats::Distribution;

// Time (minutes) between patient arrivals in an ED
let inter_arrivals = vec![8.2, 12.5, 4.1, 9.8, 6.3, 15.0, 3.7, 11.2, 7.4, 9.1];

let exp = Exponential::fit(&inter_arrivals)?;
println!("Exponential(λ={:.3} arrivals/min)", exp.lambda);
println!("Mean inter-arrival = {:.1} min", exp.mean());

// P(next patient within 5 minutes) — triage planning
let p_5min = exp.cdf(5.0)?;
println!("P(arrival < 5 min) = {:.1}%", p_5min * 100.0);
```

---

#### Chi-Squared — `distributions::chi_squared::ChiSquared`

> **When to use**: Distribution of sums of squared standard normals; used in goodness-of-fit tests and variance confidence intervals.
> **Medical examples**: Testing whether observed disease frequencies match expected proportions, variance confidence intervals for measurement devices.

```rust
use rs_stats::distributions::chi_squared::ChiSquared;
use rs_stats::Distribution;

// 6-category goodness-of-fit test: df = 6 - 1 = 5
let chi2 = ChiSquared::new(5.0)?;

// Critical value at α = 0.05
let chi2_crit = chi2.inverse_cdf(0.95)?;
println!("χ²(5) critical value (α=0.05) = {:.3}", chi2_crit);  // ≈ 11.07

// p-value for an observed χ² = 9.2
let p_value = 1.0 - chi2.cdf(9.2)?;
println!("p-value for χ²=9.2            = {:.4}", p_value);
```

---

#### F-Distribution — `distributions::f_distribution::FDistribution`

> **When to use**: Ratio of two chi-squared variables; used in ANOVA and regression significance tests.
> **Medical examples**: Comparing biomarker variance across patient groups, multi-arm ANOVA F-statistic, F-test in multiple regression predicting clinical outcomes.

```rust
use rs_stats::distributions::f_distribution::FDistribution;
use rs_stats::Distribution;

// ANOVA with 4 groups, total n=52: F(3, 48)
let f_dist = FDistribution::new(3.0, 48.0)?;

// Critical value at α = 0.05
let f_crit = f_dist.inverse_cdf(0.95)?;
println!("F(3,48) critical value (α=0.05) = {:.3}", f_crit);  // ≈ 2.80

// p-value for observed F = 4.5
let p_value = 1.0 - f_dist.cdf(4.5)?;
println!("p-value for F=4.5               = {:.4}", p_value);
```

---

#### Uniform — `distributions::uniform_distribution::Uniform`

> **When to use**: All values in a range are equally likely.
> **Medical examples**: Randomisation checks in clinical trials, uncertainty about a drug's effective window, boundary-condition stress testing.

```rust
use rs_stats::distributions::uniform_distribution::Uniform;
use rs_stats::Distribution;

// Drug release window: effective between 2 h and 6 h post-ingestion
let release = Uniform::new(2.0, 6.0)?;

// P(effective within the first 3 hours)
let p_3h = release.cdf(3.0)?;
println!("P(effective by 3h) = {:.1}%", p_3h * 100.0);  // 25%
```

---

### Discrete Distributions

All discrete distributions implement the `DiscreteDistribution` trait and expose:
- `Dist::new(params)` — validated constructor
- `Dist::fit(data)` — MLE or MOM from `&[f64]`
- `.pmf(k)`, `.logpmf(k)`, `.cdf(k)` — core functions
- `.mean()`, `.variance()`, `.std_dev()` — moments
- `.aic(data)`, `.bic(data)` — model selection

---

#### Poisson — `distributions::poisson_distribution::Poisson`

> **When to use**: Count of rare independent events in a fixed time or space window.
> **Medical examples**: Adverse drug reactions per 1 000 prescriptions, surgical site infections per month, emergency calls per hour, mutations per cell division.

```rust
use rs_stats::distributions::poisson_distribution::Poisson;
use rs_stats::DiscreteDistribution;

// Hospital-acquired infections (HAI) per ward per month: λ = 2.3
let hai = Poisson::new(2.3)?;

println!("P(0 HAI)     = {:.1}%", hai.pmf(0)? * 100.0);   // ≈ 10.0%
println!("P(≥5 HAI)    = {:.1}%", (1.0 - hai.cdf(4)?) * 100.0);  // alert threshold

// Fit from 12 months of observed counts
let monthly_counts = vec![1.0, 3.0, 2.0, 0.0, 4.0, 2.0, 1.0, 3.0, 2.0, 1.0, 5.0, 2.0];
let fitted = Poisson::fit(&monthly_counts)?;
println!("Estimated λ  = {:.2} infections/month", fitted.lambda);
```

---

#### Binomial — `distributions::binomial_distribution::Binomial`

> **When to use**: Number of successes in n independent trials with constant probability p.
> **Medical examples**: Responders in a treatment cohort, positive tests in a screening batch, side-effect events in a treated group.

```rust
use rs_stats::distributions::binomial_distribution::Binomial;
use rs_stats::DiscreteDistribution;

// Trial: n=100 patients, literature response rate p=0.35
let trial = Binomial::new(100, 0.35)?;

println!("E[responders]   = {:.0}", trial.mean());   // 35

// P(≥ 45 responders) — detect a meaningful improvement
let p_improved = 1.0 - trial.cdf(44)?;
println!("P(≥45 respond)  = {:.2}%", p_improved * 100.0);
```

---

#### Geometric — `distributions::geometric::Geometric`

> **When to use**: Number of trials until the first success (k ≥ 1).
> **Medical examples**: Screening cycles until a lesion is detected, treatment attempts until remission, needle passes until a successful lumbar puncture.

```rust
use rs_stats::distributions::geometric::Geometric;
use rs_stats::DiscreteDistribution;

// Colonoscopy screening: P(detecting polyp per session) = 0.18
let screening = Geometric::new(0.18)?;

println!("E[sessions to detect] = {:.1}", screening.mean());   // ≈ 5.6
let p_within_3 = screening.cdf(3)?;
println!("P(detected ≤ 3 sessions) = {:.1}%", p_within_3 * 100.0);
```

---

#### Negative Binomial — `distributions::negative_binomial::NegativeBinomial`

> **When to use**: Overdispersed count data (variance > mean), or number of failures before r-th success.
> **Medical examples**: Hospitalisations before stable remission, overdispersed adverse event counts, recurrences before sustained response.

```rust
use rs_stats::distributions::negative_binomial::NegativeBinomial;
use rs_stats::DiscreteDistribution;

// Re-admissions before stable remission — overdispersed (variance > mean)
let admissions = vec![
    0.0, 2.0, 1.0, 5.0, 3.0, 0.0, 4.0, 1.0, 2.0, 6.0,
    1.0, 0.0, 3.0, 2.0, 1.0, 4.0, 0.0, 2.0, 3.0, 1.0,
];

let nb = NegativeBinomial::fit(&admissions)?;
println!("NegBin(r={:.2}, p={:.3})", nb.r, nb.p);
println!("Mean re-admissions = {:.2}", nb.mean());
println!("P(0 re-admissions) = {:.1}%", nb.pmf(0)? * 100.0);
```

---

## Automatic Distribution Fitting

> **Scenario**: A pharmacokineticist wants to know which distribution best describes drug half-life across 80 patients, without assuming Normality.

```rust
use rs_stats::{auto_fit, fit_all};

// Drug half-life (hours) — typically log-normal or Weibull in PK studies
let half_lives = vec![
    4.2, 6.1, 3.8, 9.5, 5.3, 7.4, 4.9, 11.2, 3.5, 6.8,
    8.1, 4.4, 5.7, 7.0, 3.9, 10.3, 5.1,  6.5, 4.7,  8.6,
];

// One-call: auto-detect type + best AIC
let best = auto_fit(&half_lives)?;
println!("Best fit: {} (AIC={:.2}, KS p={:.3})", best.name, best.aic, best.ks_p_value);

// Full ranking — compare all candidates
println!("\n{:<15} {:>8} {:>8} {:>10}", "Distribution", "AIC", "BIC", "KS p-value");
println!("{}", "-".repeat(45));
for r in fit_all(&half_lives)? {
    println!("{:<15} {:>8.2} {:>8.2} {:>10.4}", r.name, r.aic, r.bic, r.ks_p_value);
}
// Typical output:
// Distribution    AIC      BIC   KS p-value
// -----------------------------------------
// LogNormal     82.34    84.12     0.8231
// Gamma         83.71    85.49     0.7654
// Weibull       84.02    85.80     0.7412
// Normal        89.45    91.23     0.4103
```

### Available candidates

| Type | Distributions |
|------|--------------|
| Continuous (`fit_all`) | Normal, Exponential, Uniform, Gamma, LogNormal, Weibull, Beta, StudentT, F, ChiSquared |
| Discrete (`fit_all_discrete`) | Poisson, Geometric, NegativeBinomial, Binomial |

---

## Hypothesis Testing

> **Scenario**: A clinical trial compares HbA1c reduction across three diabetes treatments.

```rust
use rs_stats::hypothesis_tests::{
    t_test::{one_sample_t_test, two_sample_t_test, paired_t_test},
    anova::one_way_anova,
    chi_square_test::chi_square_independence,
};

// ── Paired t-test: before vs after treatment ──────────────────────────────────
let before = vec![8.2, 7.9, 8.6, 9.1, 8.4, 8.0, 8.8, 9.3];  // HbA1c %
let after  = vec![7.4, 7.1, 7.9, 8.3, 7.5, 7.2, 8.0, 8.5];
let paired = paired_t_test(&before, &after)?;
println!("Paired t-test: t={:.3}, p={:.4}", paired.t_statistic, paired.p_value);
// p < 0.05 → significant reduction in HbA1c

// ── One-way ANOVA: compare three treatment arms ───────────────────────────────
let drug_a = vec![-0.8, -1.2, -0.5, -1.5, -0.9];  // HbA1c change %
let drug_b = vec![-1.4, -1.8, -1.1, -2.0, -1.6];
let drug_c = vec![-0.4, -0.6, -0.3, -0.8, -0.5];
let groups: Vec<&[f64]> = vec![&drug_a, &drug_b, &drug_c];
let anova = one_way_anova(&groups)?;
println!("ANOVA: F={:.3}, p={:.4}", anova.f_statistic, anova.p_value);

// ── Chi-square independence: side-effect rate by treatment ────────────────────
// Rows: Drug A, B, C  |  Cols: No side-effect, Side-effect occurred
let observed = vec![
    vec![42, 8],   // Drug A
    vec![36, 14],  // Drug B
    vec![45, 5],   // Drug C
];
let (_chi2, _df, p) = chi_square_independence(&observed)?;
println!("χ² independence: p={:.4}", p);
```

---

## Regression Analysis

> **Scenario**: Predict post-operative recovery time from patient characteristics.

```rust
use rs_stats::regression::linear_regression::LinearRegression;
use rs_stats::regression::multiple_linear_regression::MultipleLinearRegression;

// ── Simple linear regression: age → recovery time ────────────────────────────
let age           = vec![35.0, 45.0, 55.0, 62.0, 70.0, 48.0, 58.0, 40.0];
let recovery_days = vec![ 4.0,  5.5,  7.0,  8.5, 10.0,  6.0,  7.5,  5.0];

let mut model = LinearRegression::new();
model.fit(&age, &recovery_days)?;
println!("Recovery ~ Age: slope={:.3} d/yr, R²={:.4}", model.slope, model.r_squared);

// Predict for a 52-year-old patient with 95% CI
let predicted = model.predict(52.0);
let (lo, hi)  = model.confidence_interval(52.0, 0.95)?;
println!("Predicted (age=52): {:.1} days  95% CI [{:.1}, {:.1}]", predicted, lo, hi);

// ── Multiple regression: age + BMI + comorbidity score → recovery ─────────────
let features = vec![
    vec![35.0, 23.0, 1.0],   // [age, BMI, comorbidity score]
    vec![55.0, 28.0, 2.0],
    vec![62.0, 31.0, 3.0],
    vec![45.0, 25.0, 1.0],
    vec![70.0, 33.0, 4.0],
    vec![48.0, 26.0, 2.0],
];
let outcomes = vec![4.5, 7.2, 9.8, 5.1, 12.3, 6.4];

let mut mlr = MultipleLinearRegression::new();
mlr.fit(&features, &outcomes)?;
println!("MLR R² = {:.4}, Adj. R² = {:.4}", mlr.r_squared, mlr.adjusted_r_squared);
```

---

## Error Handling

All functions return `StatsResult<T>` — a type alias for `Result<T, StatsError>`. The library **never panics**.

```rust
use rs_stats::{StatsError, StatsResult};
use rs_stats::distributions::normal_distribution::Normal;
use rs_stats::Distribution;

fn reference_range(mean: f64, sd: f64) -> StatsResult<(f64, f64)> {
    let dist  = Normal::new(mean, sd)?;          // Err if sd ≤ 0
    let lower = dist.inverse_cdf(0.025)?;
    let upper = dist.inverse_cdf(0.975)?;
    Ok((lower, upper))
}

match reference_range(80.0, -5.0) {             // Invalid: negative SD
    Ok((lo, hi)) => println!("Ref range: [{:.1}, {:.1}]", lo, hi),
    Err(StatsError::InvalidInput { message }) =>
        println!("Invalid params: {}", message), // → "Normal::new: std_dev must be positive"
    Err(e) => println!("Error: {}", e),
}
```

### Error Variants

| Variant | Raised when |
|---------|-------------|
| `InvalidInput` | Out-of-domain parameter (negative σ, p ∉ [0,1], …) |
| `EmptyData` | Empty slice passed to `fit()` or statistical functions |
| `DimensionMismatch` | Mismatched array lengths (regression, paired tests) |
| `ConversionError` | Type conversion failures |
| `NumericalError` | Numerical instability (overflow, NaN propagation) |
| `NotFitted` | `predict()` called before `fit()` on a regression model |

---

## Documentation

```bash
cargo doc --open          # full API documentation
cargo test                # 343 unit tests + 66 doc tests
cargo clippy              # linting
cargo fmt --check         # formatting
```

---

## Contributing

1. Fork the repository
2. Create a branch: `git checkout -b feat/my-feature`
3. Commit: `git commit -m "feat(scope): description"`
4. Push and open a pull request

All PRs must pass `cargo test`, `cargo clippy -- -D warnings`, and `cargo fmt --check`.

---

## License

MIT — see [LICENSE](LICENSE).