tukey_test 0.2.0

Statistical post-hoc tests: Tukey HSD, Games-Howell, Dunnett, one-way ANOVA, and studentized range critical values — pure Rust, zero required dependencies
Documentation
  • Coverage
  • 96.97%
    96 out of 99 items documented10 out of 22 items with examples
  • Size
  • Source code size: 126.94 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 5.04 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 13s Average build duration of successful builds.
  • all releases: 13s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • TechieTeee/Rust_Tukey_Test
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • TechieTeee

The Problem

You're comparing three drug dosages, five marketing campaigns, or four manufacturing processes. A simple t-test won't work - running multiple pairwise t-tests inflates your false positive rate. With 5 groups, that's 10 comparisons, and your chance of a spurious "significant" result climbs from 5% to nearly 40%.

Post-hoc tests solve this. They control the family-wise error rate, giving you honest answers about which groups truly differ - not statistical noise.

ANOVA tells you something differs. Post-hoc tests tell you what.


Real-World Applications

Medicine & Clinical Trials - Compare patient outcomes across treatment arms. A hospital testing three pain medications needs to know not just that outcomes differ, but which drug works best and by how much, while controlling for multiple comparisons that could lead to approving an ineffective treatment.

Agriculture & Food Science - Compare crop yields across fertilizer types, or taste scores across formulations. The Tukey test was originally developed by John Tukey at Princeton for exactly these kinds of agricultural experiments - its design is purpose-built for "which of these treatments actually made a difference?"

Manufacturing & Quality Control - Compare defect rates or tolerances across production lines, shifts, or suppliers. When a factory runs five machines making the same part, Dunnett's test can flag which machines deviate from the reference line without false alarms.

Software & A/B Testing - Compare conversion rates, latency, or engagement metrics across multiple variants. With 4+ variants, pairwise t-tests give misleading results; Tukey HSD or Games-Howell gives you defensible answers.

Education & Social Science - Compare test scores across teaching methods, survey responses across demographics, or behavioral outcomes across intervention groups. These fields routinely analyze 3-10 groups and need post-hoc tests that reviewers and journals accept.

Environmental Science - Compare pollution levels across sites, species counts across habitats, or water quality across treatment methods. Ragged data (unequal sample sizes) is the norm here, which is why Tukey-Kramer and Games-Howell matter.


Why Rust?

Rust has been ranked the most admired programming language in the Stack Overflow Developer Survey for nine consecutive years (2016–2024). Developer adoption is accelerating - Rust is now used in the Linux kernel, the Windows kernel, the Android platform, and major infrastructure at AWS, Microsoft, Google, and Meta. It is no longer a niche systems language; it is becoming the default choice for any new software where correctness, performance, and long-term maintainability matter.

The statistical computing ecosystem hasn't kept up. tukey_test is part of closing that gap.


R, Python, and Mathematica are great for interactive analysis in a notebook. But when you're building something that runs statistical tests in production - a clinical trial pipeline, a manufacturing quality system, a real-time A/B testing platform - those tools create serious problems:

  • You have to shell out to another language mid-pipeline, adding subprocess overhead, serialization, and a fragile runtime dependency
  • Python and R's numerical output can vary by OS, version, and BLAS library - meaning results on your laptop may not match production
  • Deploying a Python + scipy + numpy stack (or an R environment) on edge hardware, in Docker, or in WebAssembly is genuinely painful

This is the gap tukey_test fills. Post-hoc statistical tests have essentially zero native Rust coverage. If you needed Tukey HSD in a Rust service today, your options were: reimplement it yourself, shell out to R, or change languages. None of those are acceptable in a serious production codebase.

Four reasons this matters

1. Correctness you can trust Rust's type system and borrow checker eliminate whole categories of bugs - buffer overruns, data races, silent memory corruption - that have caused real errors in scientific software. Reproducibility is a well-documented crisis in research. A statistically correct implementation that cannot silently corrupt memory is a different class of software than a Python script that happens to produce the right answer most of the time.

2. Deterministic results everywhere Rust produces bit-for-bit identical output across platforms and operating systems. For auditable systems - FDA regulatory submissions, financial reporting, clinical trial analysis - that determinism is not a nice-to-have, it's a requirement. R and Python cannot make that guarantee.

3. Zero-overhead integration A Rust service can call tukey_hsd() directly in the same process, with no FFI, no subprocess, no round-trip serialization. For high-frequency data processing or embedded applications - real-time sensor analysis, manufacturing QC on the line - that's the difference between feasible and not.

4. Simple, self-contained deployment A Rust binary with zero required dependencies ships as a single executable. No runtime to install, no package manager to invoke, no version conflicts. Compare that to deploying a full R or Python scientific stack on edge hardware or in a serverless function.

Who this is for

The intended user is not a statistician choosing between tools for a one-off analysis - use R or Python for that. This crate is for the Rust engineer who already knows what statistical test they need and is now building the system that runs it reliably, repeatedly, and at scale. That person has had no good option until now.


Available Tests

Test When to Use Function
One-way ANOVA Check if any group means differ overall one_way_anova()
Tukey HSD All pairwise comparisons (equal variances) tukey_hsd()
Games-Howell All pairwise comparisons (unequal variances) games_howell()
Dunnett's test Compare treatments against a single control dunnett()
Levene's test Test equality of variances (Brown-Forsythe variant) levene_test()
q critical value Studentized range distribution lookup q_critical()
Dunnett critical value Dunnett distribution lookup dunnett_critical()
Studentized range CDF Exact p-values for post-hoc tests ptukey_cdf()

All tests support unequal group sizes. Tukey HSD applies the Tukey-Kramer adjustment automatically.

All pairwise comparison results include an exact p-value computed via ptukey_cdf. ANOVA results include η² (eta-squared) and ω² (omega-squared) effect sizes. q_critical supports any α and any number of groups k (not limited to the table range).


Quick Start

Add to your Cargo.toml:

[dependencies]
tukey_test = "0.2"

ANOVA + Tukey HSD

The most common workflow - test for an overall effect, then find which pairs differ:

use tukey_test::{one_way_anova, tukey_hsd};

let data = vec![
    vec![6.0, 8.0, 4.0, 5.0, 3.0, 4.0],   // Group A
    vec![8.0, 12.0, 9.0, 11.0, 6.0, 8.0],  // Group B
    vec![13.0, 9.0, 11.0, 8.0, 12.0, 14.0], // Group C
];

// Step 1: Is there an overall difference?
let anova = one_way_anova(&data).unwrap();
println!("{anova}");
// F = 13.18, p = 0.0005

// Step 2: Which pairs differ?
let result = tukey_hsd(&data, 0.05).unwrap();
for pair in result.significant_pairs() {
    println!("Groups {} and {} differ (q = {:.4})",
        pair.group_i, pair.group_j, pair.q_statistic);
}

Games-Howell - when variances aren't equal

use tukey_test::games_howell;

let data = vec![
    vec![4.0, 5.0, 3.0, 4.0, 6.0],
    vec![20.0, 30.0, 25.0, 35.0, 28.0],  // high variance
    vec![5.0, 7.0, 6.0, 4.0, 5.0],
];
let result = games_howell(&data, 0.05).unwrap();
println!("{result}");

Dunnett's test - compare against a control

use tukey_test::dunnett;

let data = vec![
    vec![10.0, 12.0, 11.0, 9.0, 10.0],   // control (group 0)
    vec![15.0, 17.0, 14.0, 16.0, 15.0],   // treatment A
    vec![11.0, 13.0, 10.0, 12.0, 11.0],   // treatment B
];
let result = dunnett(&data, 0, 0.05).unwrap();
for t in result.significant_treatments() {
    println!("Treatment {} differs from control (t = {:.4})",
        t.treatment, t.t_statistic);
}

Levene's test - check variance equality before choosing Tukey vs Games-Howell

use tukey_test::levene_test;

let data = vec![
    vec![5.0, 6.0, 5.5, 5.2, 5.8],   // group A (low variance)
    vec![1.0, 9.0, 3.0, 8.0, 5.0],   // group B (high variance)
];
let result = levene_test(&data, 0.05).unwrap();
if result.significant {
    println!("Unequal variances detected - use games_howell()");
} else {
    println!("Variances are homogeneous - tukey_hsd() is appropriate");
}

Exact p-values and effect sizes

Every pairwise comparison result includes an exact p-value computed via the studentized range CDF:

use tukey_test::tukey_hsd;

let data = vec![
    vec![6.0, 8.0, 4.0, 5.0, 3.0, 4.0],
    vec![8.0, 12.0, 9.0, 11.0, 6.0, 8.0],
    vec![13.0, 9.0, 11.0, 8.0, 12.0, 14.0],
];
let result = tukey_hsd(&data, 0.05).unwrap();
for pair in &result.comparisons {
    println!("Groups ({}, {}): p = {:.4}", pair.group_i, pair.group_j, pair.p_value);
}

ANOVA results include η² (eta-squared) and ω² (omega-squared) for practical effect size:

use tukey_test::one_way_anova;

let data = vec![
    vec![6.0, 8.0, 4.0, 5.0, 3.0, 4.0],
    vec![8.0, 12.0, 9.0, 11.0, 6.0, 8.0],
    vec![13.0, 9.0, 11.0, 8.0, 12.0, 14.0],
];
let result = one_way_anova(&data).unwrap();
println!("η² = {:.3}, ω² = {:.3}", result.eta_squared, result.omega_squared);
// η² = 0.637, ω² = 0.589

Data Input

Flexible types in the library

All test functions accept any type implementing AsRef<[f64]> for groups - no need to convert into Vec<Vec<f64>>:

use tukey_test::one_way_anova;

// Slices
let data: &[&[f64]] = &[&[1.0, 2.0, 3.0], &[4.0, 5.0, 6.0]];
one_way_anova(data).unwrap();

// Fixed-size arrays
let data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]];
one_way_anova(&data).unwrap();

Load from CSV

Read data from CSV files in wide format (each column is a group). Headers are auto-detected and skipped. Columns can have unequal lengths. Values of NaN, Infinity, and -Infinity are rejected with a ParseError.

control,treatment_a,treatment_b
10,15,11
12,17,13
11,14,10
9,16,12
10,15,11
use tukey_test::{parse_csv_file, tukey_hsd};

let groups = parse_csv_file("experiment.csv").unwrap();
let result = tukey_hsd(&groups, 0.05).unwrap();
println!("{result}");

You can also parse from any std::io::Read source:

use tukey_test::parse_csv;

let csv_data = "a,b\n1,4\n2,5\n3,6\n";
let groups = parse_csv(csv_data.as_bytes()).unwrap();

Optional Features

Serde

Serialize any result to JSON, CSV, or any serde-supported format:

[dependencies]
tukey_test = { version = "0.2", features = ["serde"] }

CLI

The crate also ships as a command-line tool.

git clone https://github.com/TechieTeee/Rust_Tukey_Test.git
cd Rust_Tukey_Test
cargo build --release

Data input options

# Inline - separate groups with --
tukey_test hsd 0.05 6 8 4 5 3 4 -- 8 12 9 11 6 8 -- 13 9 11 8 12 14

# From a CSV file
tukey_test hsd 0.05 --file experiment.csv

# From stdin (pipe-friendly)
cat data.csv | tukey_test hsd 0.05 --file -

Commands

# One-way ANOVA
tukey_test anova 6 8 4 5 3 4 -- 8 12 9 11 6 8 -- 13 9 11 8 12 14
tukey_test anova --file data.csv

# Tukey HSD (alpha = 0.05)
tukey_test hsd 0.05 6 8 4 5 3 4 -- 8 12 9 11 6 8 -- 13 9 11 8 12 14
tukey_test hsd 0.05 --file data.csv

# Games-Howell (alpha = 0.05)
tukey_test games-howell 0.05 4 5 3 4 6 -- 20 30 25 35 28 -- 5 7 6 4 5
tukey_test games-howell 0.05 --file data.csv

# Dunnett (alpha = 0.05, control = group 0)
tukey_test dunnett 0.05 0 10 12 11 9 10 -- 15 17 14 16 15 -- 11 13 10 12 11
tukey_test dunnett 0.05 0 --file data.csv

# Critical value lookup
tukey_test q 3 15 0.05

Example Output

One-Way ANOVA

Source                 SS     df           MS          F    p-value
--------------------------------------------------------------------
Between          117.4444      2      58.7222    13.1796   0.000497
Within            66.8333     15       4.4556
Total            184.2778     17
Tukey HSD Test Results (alpha = 0.05, df = 15, MSE = 4.4556, q_critical = 3.6700)

Comparison      Mean Diff     q-stat  Significant   CI
------------------------------------------------------------------------
( 0,  1)           4.0000     4.6418          Yes   [-7.1626, -0.8374]
( 0,  2)           6.1667     7.1561          Yes   [-9.3292, -3.0041]
( 1,  2)           2.1667     2.5143           No   [-5.3292, 0.9959]
Dunnett's Test Results (alpha = 0.05, control = group 0, df = 12, d_critical = 2.3800)

Treatment           Mean Diff     t-stat  Significant   CI
--------------------------------------------------------------------------
Group  1 vs  0        5.0000     6.9338          Yes   [3.2838, 6.7162]
Group  2 vs  0        1.0000     1.3868           No   [-0.7162, 2.7162]

How to Choose the Right Test

  Got multiple groups to compare?
  |
  |-- Start with one_way_anova()
  |   p < 0.05? --> significant overall difference
  |   eta_squared / omega_squared --> practical effect size
  |
  |-- Want all pairwise comparisons?
  |   |
  |   |-- Check variances first: levene_test()
  |   |   |-- Not significant (homogeneous)? --> tukey_hsd()
  |   |   +-- Significant (unequal)?         --> games_howell()
  |   |
  |   +-- All pairwise results include exact p-values
  |
  +-- Comparing treatments to a control?
      +-- dunnett()

Roadmap

This crate aims to be the go-to resource for post-hoc and related statistical tests in Rust. Future plans include:

  • Scheffe's test - the most conservative post-hoc test, for complex contrasts
  • Bonferroni / Holm corrections - general-purpose p-value adjustment for multiple comparisons
  • Welch's ANOVA - one-way ANOVA without equal variance assumption
  • Cohen's d - pairwise effect size for individual comparisons
  • Dunnett's test for any k/df - remove table-based limits via numerical multivariate-t integration (currently requires df ≥ 5 and k ≤ 9)
  • Games-Howell non-integer df - fix Welch-Satterthwaite df truncation for more accurate p-values at small sample sizes
  • ptukey_cdf extended validation - comprehensive accuracy benchmarks against R across the full (k, df, q) grid, especially for k > 20 and df < 5
  • no_std / f32 support - for embedded and WASM scientific computing targets

These items are targeted for 0.3.0.

Contributions, feature requests, and bug reports are welcome!


Prerequisites

Rust 1.56+ (edition 2021). Install from rust-lang.org.

License

MIT