Crate statskit

Expand description

§statskit

Statistical judgment and evaluation.

This layer is the statistical mirror for the stack: turn “it seems better” into “it is better, under a stated metric, with assumptions stated.”

No metric without a use case: add a metric only when it is used downstream and has tests.
Uncertainty is explicit when present: when a function makes a statistical claim (CI, p-value), assumptions must be spelled out in the rustdoc.
Small surface: prefer a narrow set of well-specified primitives over a grab-bag.

stats: descriptive statistics (mean, variance, stddev) and statistical tests (bootstrap BCa, Wilcoxon, permutation, ASO, multiple-comparison corrections, effect sizes)
classify: classification metrics (precision, recall, F1, MCC, ROC-AUC, PR-AUC, confusion matrix, classification report, log loss, balanced accuracy, specificity, Cohen’s kappa, Hamming loss, Jaccard score)
calibration: calibration metrics (Brier score, ECE, MCE, reliability diagram)
regression: regression metrics (MSE, RMSE, MAE, R-squared)

calibration
classify
regression
stats: Statistical helpers: descriptive statistics, bootstrap, hypothesis tests, and effect sizes.