Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
gam
gam is a formula-first CLI and Rust engine for penalized regression models.
The current CLI supports:
- Standard mean models with penalized linear terms, random effects, and smooths
- A surfaced location-scale fitting path via
--predict-noise - Survival models via
Surv(entry, exit, event) - An advanced Bernoulli marginal-slope workflow via
--logslope-formulaand--z-column - Prediction, HTML reports, ALO diagnostics, posterior sampling, and synthetic-data generation
The CLI is the primary interface. The Rust modules exported by the crate are used internally and can change without compatibility guarantees.
Requirements
- Rust
1.93+for local source builds - CSV input data with a header row
- A shell that supports quoted formulas (
bash/zshexamples below)
Install
Prebuilt binary
macOS, Linux, and Windows Git Bash:
|
Build from source
If gam is not on your PATH, use ./target/release/gam in the examples below.
Command Overview
| Command | Purpose | Required arguments | Output |
|---|---|---|---|
gam fit |
Fit a model | <DATA> <FORMULA> |
No file unless --out is provided |
gam predict |
Score new data | <MODEL> <NEW_DATA> plus --out |
Prediction CSV |
gam report |
Build a standalone HTML report | <MODEL> [DATA] [OUT] |
[OUT] or <model-stem>.report.html in the current directory |
gam diagnose |
Run terminal diagnostics | <MODEL> <DATA> |
Prints a diagnostics table |
gam sample |
Draw posterior samples | <MODEL> <DATA> |
posterior_samples.csv by default, plus a summary CSV |
gam generate |
Sample synthetic outcomes conditional on input rows | <MODEL> <DATA> |
synthetic.csv by default |
Aliases:
gam train->gam fitgam simulate->gam generate
Inspect full options with:
Verified Quickstart
These commands were checked against the current binary and the checked-in lidar dataset.
# 1) Fit a Gaussian GAM
# 2) Predict with uncertainty
# 3) Build an HTML report
# writes: lidar.model.report.html
# 4) Generate synthetic response draws
Formula Language
gam fit expects:
response ~ term + term + ...
Response forms:
- Standard regression/classification:
y - Survival:
Surv(entry, exit, event)
Important constraints:
Surv(...)currently requires exactly three columns- Intercept removal (
0or-1) is not supported - At most one
link(...), onelinkwiggle(...), onetimewiggle(...), and onesurvmodel(...)may appear in a formula
Bare RHS terms
A bare column on the right-hand side is interpreted from the training schema:
- Continuous or binary column: penalized linear term
- Categorical column: random-effect block
Term wrappers
Linear and constrained coefficients:
linear(x)linear(x, min=..., max=...)constrain(x, min=..., max=...)nonnegative(x)/nonnegative_coef(x)nonpositive(x)/nonpositive_coef(x)bounded(x, min=..., max=...)
bounded(...) also supports:
prior=none|uniform|log-jacobian|centerbeta_a=..., beta_b=...target=..., strength=...
Random effects:
group(x)orre(x)
Smooths:
smooth(...)ors(...)thinplate(...),thin_plate(...),tps(...)matern(...)duchon(...)tensor(...),interaction(...),te(...)
Formula-level configuration terms:
link(type=...)linkwiggle(...)timewiggle(...)survmodel(spec=..., distribution=...)
Smooth defaults
smooth(x)with one variable defaults to a B-spline / P-spline style basissmooth(x1, x2, ...)defaults to thin-platete(...)defaults to tensor-product B-splines
Notable smooth options:
- B-spline:
degree,knots,k,penalty_order - Thin-plate:
centersork - Matérn:
centersork,nu,length_scale - Duchon:
centersork,power,order, optionallength_scale - Tensor:
k/basis_dimfor marginal basis size
Spatial smooths can use per-axis anisotropy:
- Global CLI flag:
--scale-dimensions - Per-term override:
scale_dims=trueorscale_dims=false
Fit Modes
1. Standard mean-only fits
Auto family resolution:
- Binary
{0,1}response -> binomial logit - Anything else -> gaussian identity
--predict-noisedoes not change that default; writelink(type=probit)(or another explicit link) in the mean formula when you want a different binomial base link
2. Location-scale fits
Use a second formula for the scale/noise block:
If you want a probit-vs-probit comparison between mean-only and location-scale fits, declare the link explicitly in both formulas:
The CLI exposes this path for Gaussian and binomial families, and for Surv(...) formulas it routes into the survival location-scale fitter. Runtime behavior is still uneven enough that you should treat it as experimental and verify it on your exact formula/data combination before relying on it.
3. Survival fits
Use Surv(entry, exit, event) on the left-hand side:
Current survival likelihood modes:
transformationweibulllocation-scale
Distributional survival fits can use a second formula for log-sigma:
When --predict-noise is present on a Surv(...) formula, the CLI uses the survival location-scale fit path.
Current survival-specific formula/config support:
survmodel(spec=net, distribution=...)timewiggle(...)link(...)linkwiggle(...)only in supported survival sub-modes
4. Bernoulli marginal-slope fits
This is an advanced binary-response mode that adds a second formula for the log-slope surface plus an auxiliary standardized score column:
Current restrictions:
- Response must be binary
{0,1} --predict-noiseis not allowed--firthis not allowedlink(...)andlinkwiggle(...)are not allowed in this family or in--logslope-formula
Link Functions
Links are configured in-formula via link(type=...).
Supported type values:
identitylogitprobitcloglogsasbeta-logisticblended(a,b,...)/mixture(a,b,...)flexible(<single-link>)flexible(blended(...))
Advanced link parameters:
rho=for blended/mixture linkssas_init="epsilon,log_delta"beta_logistic_init="epsilon,delta"
Output and Data Semantics
Saved models
gam fitwrites nothing unless--outis provided- Saved model JSON includes training schema and header metadata
- Prediction-like commands reload new data using that saved schema
- If a model predates current metadata requirements, refit it with the current CLI
Prediction CSV schema
Standard and Bernoulli marginal-slope models:
- default:
eta,mean - with
--uncertainty:eta,mean,effective_se,mean_lower,mean_upper
Gaussian location-scale models, when the fit path succeeds:
- default:
eta,mean,sigma - with
--uncertainty:eta,mean,sigma,mean_lower,mean_upper
Survival models:
- default:
eta,mean,survival_prob,risk_score,failure_prob - with
--uncertainty:eta,mean,survival_prob,risk_score,failure_prob,effective_se,mean_lower,mean_upper
Notes:
- In survival output,
meanis the same quantity assurvival_prob risk_scoreis risk-oriented and currently tracks the linear predictor directioneffective_seis estimator uncertainty, not observation noise
Sampling output
gam sample writes:
- Raw draws CSV with columns
beta_0,beta_1, ... - A second summary CSV at
<out with extension summary.csv>
Defaults when --out is omitted:
- Draws:
posterior_samples.csv - Summary:
posterior_samples.summary.csv
Current sampling support:
- Standard models
- Survival models on the non-location-scale path
Not currently available for:
- Gaussian location-scale models
- Binomial location-scale models
- Bernoulli marginal-slope models
Synthetic generation output
gam generate writes a numeric matrix:
- One row per sampled dataset
- One column per conditioning-data row
- Column names are
draw_0,draw_1, ... indexed by input row position
Defaults when --out is omitted:
synthetic.csv
Not currently available for:
- Survival models
- Bernoulli marginal-slope models
Report output
gam report <MODEL> [DATA] [OUT] writes:
[OUT]if provided- Otherwise
<model-stem>.report.htmlin the current working directory
The report is standalone HTML. With data input it includes data-dependent diagnostics; without data input those sections are omitted.
Schema compatibility
Prediction, reporting, sampling, and generation expect the new data to match the saved training schema:
- Column names must match
- Column types must match
- Unseen categorical levels are treated as errors
Current CLI Limitations
diagnosecurrently only exposes--alodiagnose --alois not supported for models containingbounded(...)coefficients--predict-noiseis exposed in the CLI, but current Gaussian, binomial, and survival location-scale fits still have rough edges; verify behavior on your exact workload before depending on that pathlinkwiggle(...)belongs in the mean formula, not--predict-noise- Flexible links are only supported in specific binomial and survival paths
- Some benchmark datasets in
bench/datasets/are meant for harness scenarios rather than copy-paste README demos
Development
Common local checks:
Benchmark harness:
Repository layout:
src/: CLI, model code, fitting/inference, smooth construction, and survival machinerybench/: benchmark harness, scenario configs, datasets, and comparison toolingtests/: Rust integration tests plus benchmark helper tools
Lean checks for the Rust-matched .lean files under src/: