Crate perfgate_significance

Expand description

Statistical significance testing for benchmarking.

This crate provides Welch’s t-test implementation for detecting statistically significant performance changes between benchmark runs.

Part of the perfgate workspace.

§Statistical Methodology

§Welch’s t-test

Welch’s t-test is an adaptation of Student’s t-test that is more reliable when the two samples have unequal variances and/or unequal sample sizes. This makes it ideal for benchmarking where:

Baseline and current runs may have different numbers of samples
Variance can differ significantly between runs due to system noise
We want to detect real performance changes, not just noise

§Formula

The test statistic is computed as:

t = (mean_1 - mean_2) / sqrt(var_1/n_1 + var_2/n_2)

The degrees of freedom is approximated using the Welch-Satterthwaite equation:

df = (var_1/n_1 + var_2/n_2)² / ((var_1²/n_1²(n_1-1)) + (var_2²/n_2²(n_2-1)))

§Interpretation

The p-value represents the probability of observing a difference as extreme as (or more extreme than) the measured difference, assuming no real change.
A small p-value (≤ alpha, typically 0.05) indicates strong evidence against the null hypothesis, suggesting a statistically significant change.

§Limitations

Minimum samples: Requires at least min_samples in both groups (typically 8) for reliable results with smaller sample sizes, the test returns None
Zero variance: When all values in a group are identical, the test handles this edge case explicitly (returns p-value 1.0 if means are equal, 0.0 otherwise)
Assumptions: Assumes data is approximately normally distributed; for highly skewed distributions, consider non-parametric alternatives

Functions§

compute_significance: Compute statistical significance using Welch’s t-test.
mean_and_variance: Compute sample mean and unbiased variance (Bessel’s correction).