Expand description
Statistical significance testing for benchmarking.
This crate provides Welch’s t-test implementation for detecting statistically significant performance changes between benchmark runs.
Part of the perfgate workspace.
§Statistical Methodology
§Welch’s t-test
Welch’s t-test is an adaptation of Student’s t-test that is more reliable when the two samples have unequal variances and/or unequal sample sizes. This makes it ideal for benchmarking where:
- Baseline and current runs may have different numbers of samples
- Variance can differ significantly between runs due to system noise
- We want to detect real performance changes, not just noise
§Formula
The test statistic is computed as:
t = (mean_1 - mean_2) / sqrt(var_1/n_1 + var_2/n_2)The degrees of freedom is approximated using the Welch-Satterthwaite equation:
df = (var_1/n_1 + var_2/n_2)² / ((var_1²/n_1²(n_1-1)) + (var_2²/n_2²(n_2-1)))§Interpretation
- The p-value represents the probability of observing a difference as extreme as (or more extreme than) the measured difference, assuming no real change.
- A small p-value (≤ alpha, typically 0.05) indicates strong evidence against the null hypothesis, suggesting a statistically significant change.
§Limitations
- Minimum samples: Requires at least
min_samplesin both groups (typically 8) for reliable results with smaller sample sizes, the test returnsNone - Zero variance: When all values in a group are identical, the test handles this edge case explicitly (returns p-value 1.0 if means are equal, 0.0 otherwise)
- Assumptions: Assumes data is approximately normally distributed; for highly skewed distributions, consider non-parametric alternatives
Functions§
- compute_
significance - Compute statistical significance using Welch’s t-test.
- mean_
and_ variance - Compute sample mean and unbiased variance (Bessel’s correction).