Expand description
End-to-end quality comparison against mature, standard statistical tools.
The harness lets a cargo test integration test fit the same data with a
trusted reference implementation and assert that gam’s fitted function,
coefficients, effective degrees of freedom, predictions, or uncertainty
agree with what practitioners already trust. It is deliberately
tool-agnostic: a test supplies an arbitrary R or Python body and the harness
handles all of the data plumbing and result parsing.
Reference toolchains supported today:
- R via
Rscript—mgcv,gamlss,survival, and any package the body chooses tolibrary(). - Python via
python3—scikit-learn,scipy,statsmodels,lifelines,scikit-survival, and anything else importable.
There is no skip path. If the interpreter or a required package is not
installed, run_r/run_python fail loudly and the test fails — a missing
reference dependency is a real failure, not a silent pass. CI is expected to
provision the reference stack. (Only genuine hardware gates, e.g. CUDA, are
allowed to skip; that lives in tests/common/gpu_gate.rs, not here.)
Wire protocol (kept dependency-free on purpose — no JSON crate on the R/
Python side): the test body calls emit("key", numeric_vector) for every
quantity it wants to return. The harness reads these back as
key: v1 v2 v3 ... lines and exposes them as f64 scalars / vectors.
Structs§
- Column
- A named numeric column handed to the reference body as a
data.framecolumn (R) or a NumPy arraydf["name"](Python). - Design
Diagnostics - DmlPartial
Linear Reference - A Double Machine Learning (DML) reference estimate of the average linear
effect
θ = E[∂E(Y|D,X)/∂D]of a treatment/doseDon outcomeYafter partialling out confoundersX, computed by a mature Python DML library (DoubleML’s partially-linear model, with EconML’sLinearDMLas fallback). - Penalty
Diagnostics - Prediction
Fingerprint - Quality
Diagnostics - Compact, reusable diagnostics for truth/reference quality tests.
- Reference
Result - Parsed results emitted by a reference-tool body via
emit(key, values).
Functions§
- design_
diagnostics - dml_
partial_ linear_ reference - Fit a partially-linear DML model
Y = θ·D + g(X) + ε,D = m(X) + νwith a mature Python DML library and return its orthogonal estimate ofθ. - held_
out_ r2 - Out-of-sample coefficient of determination against the held-out mean.
- max_
abs_ diff - Maximum absolute difference between two equal-length vectors.
- pad_to
- Right-pad a vector with its last value, or 0.0 when empty.
- pearson
- Pearson correlation between two equal-length vectors.
- penalty_
diagnostics - prediction_
fingerprint - r2
- Coefficient of determination against the mean predictor.
- r_
package_ available - Probe whether an R package can actually be loaded (namespace + any native
dyn.load) in the reference interpreter, without raising. Returnstrueonly whenrequireNamespacereports the package is usable. - relative_
l2 - Relative L2 distance
||a - b|| / max(||b||, eps)— the natural scale-free measure of how closely a fitted function tracks a reference function evaluated on the same grid. - rmse
- Root-mean-square difference between two equal-length vectors.
- run_
python - Run a Python reference body. The columns are exposed as a pandas
df(or, when pandas is unavailable, a dict of NumPy arrays). The body callsemit("key", iterable)to return results. Fails the test with captured stderr when Python exits non-zero (missingpython3, missing module, or a raised exception). - run_r
- Run an R reference body. The columns are exposed as a
data.framenameddf; the body callsemit("key", numeric_vector)to return results. The harness prepends thedf, output path, andemithelper. Fails the test with the captured stderr when R exits non-zero — a broken or unavailable reference run (missingRscript, missing package, R error) is a hard test failure, never a silent skip.