gam 0.3.7

Generalized penalized likelihood engine
Documentation

gam

gam is a formula-first CLI and Rust engine for generalized additive models.

It fits Gaussian, binomial, Poisson, and Gamma GLMs with smooth terms, random effects, location-scale extensions, survival likelihoods, and flexible/learnable link functions. Smoothing parameters are selected by REML or LAML. Posterior sampling uses NUTS.

Please open an issue if anything doesn't work as expected, if you'd like a new feature, or for questions.

What's different

  • Three-part penalty structure. Each smooth gets separate penalties for magnitude, gradient, and curvature. Most GAM libraries use one (curvature only) or two (curvature + combined magnitude/gradient). The three-part structure gives the smoother more degrees of freedom to distinguish flat-but-offset functions from wiggly ones.
  • Flexible link functions. A spline offset from a base link (e.g. probit) lets the data correct for link misspecification while encoding the belief that the base link is approximately right. Marginal-slope models use a calibrated de-nested probit transport kernel for score-warp/link-deviation terms, not an exact nested link composition or post-hoc calibration. The same mechanism applies to survival time basis functions.
  • Surface smooths. Thin-plate splines, Duchon radial bases with triple operator regularization, and Matern covariance-based smooths in arbitrary dimension, with automatic knot placement.
  • Adaptive anisotropy. Per-axis spatial anisotropy (--scale-dimensions) lets the model shrink or stretch each feature axis independently within a single joint smooth, instead of assuming isotropic smoothness. Matérn and hybrid Duchon optimize a global scale plus per-axis contrasts; pure Duchon optimizes the per-axis contrasts directly without introducing a global length scale.
  • Composable basis/kernel. You can combine the kernel of one spline family with the length-scale behavior of another (e.g. Duchon kernel with Matern-style global kappa scaling).

Install

Prebuilt binary

macOS, Linux, and Windows Git Bash:

curl -fsSL https://raw.githubusercontent.com/SauersML/gam/main/install.sh | bash

Build from source

Requires Rust.

git clone https://github.com/SauersML/gam.git
cd gam
cargo build --release

The binary is at ./target/release/gam. Add it to your PATH or use the full path in the examples below.

Python package

The repo now includes a mixed Rust/Python package built around PyO3 and maturin.

uv venv
source .venv/bin/activate
uv pip install maturin
maturin develop --manifest-path crates/gam-pyffi/Cargo.toml
python -c "import gamfit; print(gamfit.build_info())"

Formula-first usage:

import gamfit

train = [
    {"y": 1.0, "x": 0.0},
    {"y": 2.0, "x": 1.0},
    {"y": 3.0, "x": 2.0},
]

model = gamfit.fit(train, "y ~ x")
pred = model.predict([{"x": 1.5}, {"x": 2.5}], interval=0.95)
summary = model.summary()
check = model.check([{"x": 1.5}])
diagnostics = model.diagnose(train)
gamfit.validate_formula(train, "y ~ x")
model.plot(train, kind="prediction")
html = model.report()
model.save("linear.gam")

scikit-learn usage:

from gamfit.sklearn import GAMRegressor

est = GAMRegressor(formula="y ~ x")
est.fit(train)
pred = est.predict([{"x": 1.5}, {"x": 2.5}])

The native extension is gamfit._rust, while the public Python API lives under gamfit/.

Quick start

# Fit a GAM with a smooth term
gam fit data.csv 'y ~ smooth(x)' --out model.json

# Predict with uncertainty intervals
gam predict model.json new_data.csv --out predictions.csv --uncertainty

# Build a standalone HTML report
gam report model.json data.csv

# Draw posterior samples
gam sample model.json data.csv --out samples.csv

# Generate synthetic response draws
gam generate model.json data.csv --n-draws 5 --out synthetic.csv

Commands

Command What it does Usage
fit Fit a model gam fit <DATA> <FORMULA> [--out model.json]
predict Score new data gam predict <MODEL> <DATA> --out predictions.csv
report Standalone HTML report gam report <MODEL> [DATA] [OUT]
diagnose Terminal diagnostics gam diagnose <MODEL> <DATA>
sample Posterior draws (NUTS) gam sample <MODEL> <DATA> [--out samples.csv]
generate Synthetic outcomes gam generate <MODEL> <DATA> [--out synthetic.csv]

train is an alias for fit. simulate is an alias for generate.

Run gam <command> --help for full options.

Formula language

response ~ term + term + ...

Response

  • Continuous, binary, count, or positive continuous: y
  • Survival (interval-censored): Surv(entry_time, exit_time, event)

Terms

Linear and constrained coefficients:

Syntax Effect
x or linear(x) Penalized linear term
linear(x, min=0) Non-negative coefficient
linear(x, min=..., max=...) Box-constrained coefficient
nonnegative(x) Sugar for linear(x, min=0)
nonpositive(x) Sugar for linear(x, max=0)
bounded(x, min=0, max=1) Exact interval transform (no ridge)
bounded(x, ..., prior=uniform) Flat prior on bounded scale
bounded(x, ..., target=0.5, strength=3) Informative interior prior

Random effects:

Syntax Effect
group(id) or re(id) Random intercept per level of id

Smooths:

Syntax Default basis
smooth(x) or s(x) P-spline (B-spline + difference penalty)
smooth(x1, x2) Thin-plate spline
thinplate(x1, x2) or tps(x1, x2) Thin-plate spline
matern(x1, x2, ...) Matern covariance smooth
duchon(x1, x2, ...) Duchon radial basis with triple operator regularization (scale-free)
tensor(x, z) or te(x, z) Tensor-product B-splines

Common smooth options: knots=, k=, centers=, degree=, penalty_order=, type=ps|tps|matern|duchon. double_penalty=true|false applies to P-spline, thin-plate, tensor, and Matérn smooths; Duchon smooths use mass, tension, and stiffness operator penalties.

Spatial smooths support per-axis anisotropy via scale_dims=true or the global --scale-dimensions flag. For pure Duchon this stays scale-free: the optimizer updates only centered per-axis shape contrasts, not a scalar length_scale.

Formula-level configuration:

Syntax Effect
link(type=logit) Set link function
linkwiggle(internal_knots=10) Spline deviation from the base link
timewiggle(internal_knots=8) Spline deviation from the time basis (survival)
survmodel(spec=net, distribution=gaussian) Survival model configuration

Auto-detection

The family is inferred from the response column:

  • Binary {0, 1}: binomial with logit link
  • Everything else: Gaussian with identity link

Override with link(type=...) in the formula. Poisson and Gamma families are available via explicit link specification.

Fit modes

Standard

gam fit data.csv 'y ~ age + smooth(bmi) + group(site)' --out model.json

Location-scale (jointly model mean and variance)

gam fit data.csv 'y ~ smooth(x1) + smooth(x2)' \
  --predict-noise 'smooth(x1)' \
  --out model.json

Works for Gaussian and binomial families. For survival formulas, --predict-noise routes to the survival location-scale fitter.

Survival

gam fit data.csv \
  'Surv(t0, t1, event) ~ age + smooth(bmi) + survmodel(spec=net, distribution=gaussian)' \
  --survival-likelihood transformation \
  --out model.json

Likelihood modes: transformation, weibull, location-scale.

Add --predict-noise for distributional (location-scale) survival:

gam fit data.csv \
  'Surv(t0, t1, event) ~ age + smooth(bmi) + survmodel(spec=net, distribution=gaussian)' \
  --predict-noise 'smooth(age)' \
  --out model.json

Bernoulli marginal-slope

Models P(case | covariates, z) where z is a standardized score (e.g. a polygenic risk score). The key idea: the baseline risk surface and the effect of z are decoupled into separate formulas. The main formula controls the population-level risk landscape (how risk varies with age, ancestry PCs, etc.), while --logslope-formula controls how strongly z modifies that risk at each point in covariate space. This decoupling lets you estimate spatially-varying effect sizes for z without the baseline absorbing signal that belongs to the slope, or vice versa.

gam fit data.csv \
  'case ~ smooth(age) + matern(pc1, pc2, pc3)' \
  --logslope-formula 'matern(pc1, pc2, pc3)' \
  --z-column prs_z \
  --out model.json

Link functions

Set via link(type=...) in the formula.

Link Syntax
Identity link(type=identity)
Logit link(type=logit)
Probit link(type=probit)
Complementary log-log link(type=cloglog)
SAS (sinh-arcsinh) link(type=sas)
Beta-logistic link(type=beta-logistic)
Blended mixture link(type=blended(logit, probit))
Flexible (data-driven) link(type=flexible(logit))

Flexible links add a spline offset to the base link, letting the data correct for link misspecification.

Prediction output

Model type Default columns With --uncertainty
Standard / binomial eta, mean + effective_se, mean_lower, mean_upper
Gaussian location-scale eta, mean, sigma + mean_lower, mean_upper
Survival eta, mean, survival_prob, risk_score, failure_prob + effective_se, mean_lower, mean_upper

Other outputs

gam report writes a standalone HTML file with model summary, smooth plots, and diagnostics. Pass training data for data-dependent diagnostics.

gam sample writes posterior draws (beta_0, beta_1, ...) and a summary CSV. Uses NUTS (No-U-Turn Sampler).

gam generate writes a matrix of synthetic outcomes (rows = draws, columns = data rows).

gam diagnose prints terminal diagnostics. Supports --alo for approximate leave-one-out.

Development

cargo fmt --all
cargo clippy --all-targets --all-features -- -A warnings -D clippy::correctness -D clippy::suspicious
cargo test --all-features

Benchmark suite:

python3 bench/run_suite.py --help
python3 bench/run_suite.py

Layout:

  • src/ -- CLI, fitting engine, inference, smooth construction, survival machinery
  • bench/ -- benchmark harness, scenario configs, datasets, comparison tooling
  • tests/ -- integration tests