Skip to main content

Crate perfgate_significance

Crate perfgate_significance 

Source
Expand description

Statistical significance testing for benchmarking.

This crate provides Welch’s t-test implementation for detecting statistically significant performance changes between benchmark runs.

Part of the perfgate workspace.

§Statistical Methodology

§Welch’s t-test

Welch’s t-test is an adaptation of Student’s t-test that is more reliable when the two samples have unequal variances and/or unequal sample sizes. This makes it ideal for benchmarking where:

  • Baseline and current runs may have different numbers of samples
  • Variance can differ significantly between runs due to system noise
  • We want to detect real performance changes, not just noise

§Formula

The test statistic is computed as:

t = (mean_1 - mean_2) / sqrt(var_1/n_1 + var_2/n_2)

The degrees of freedom is approximated using the Welch-Satterthwaite equation:

df = (var_1/n_1 + var_2/n_2)² / ((var_1²/n_1²(n_1-1)) + (var_2²/n_2²(n_2-1)))

§Interpretation

  • The p-value represents the probability of observing a difference as extreme as (or more extreme than) the measured difference, assuming no real change.
  • A small p-value (≤ alpha, typically 0.05) indicates strong evidence against the null hypothesis, suggesting a statistically significant change.

§Limitations

  • Minimum samples: Requires at least min_samples in both groups (typically 8) for reliable results with smaller sample sizes, the test returns None
  • Zero variance: When all values in a group are identical, the test handles this edge case explicitly (returns p-value 1.0 if means are equal, 0.0 otherwise)
  • Assumptions: Assumes data is approximately normally distributed; for highly skewed distributions, consider non-parametric alternatives

Functions§

compute_significance
Compute statistical significance using Welch’s t-test.
mean_and_variance
Compute sample mean and unbiased variance (Bessel’s correction).