gam 0.3.21

Generalized penalized likelihood engine
Documentation

gam · gamfit

PyPI Python Docs Rust CI License

A formula-first generalized additive model engine. Written in Rust for speed, with a polished Python library on top.

Fits Gaussian, binomial, Poisson, and Gamma GLMs with smooth terms, random effects, bounded/constrained coefficients, location-scale extensions, survival likelihoods, and flexible/learnable link functions. Smoothing parameters are selected by REML or LAML. Posterior sampling uses NUTS.

Docs: https://gamfit.readthedocs.io/ · PyPI: https://pypi.org/project/gamfit/

Two ways to use it

# Python library (gamfit)
import gamfit
model = gamfit.fit(train, "y ~ s(x) + group(site)")
preds = model.predict(test, interval=0.95)
# Rust CLI (gam)
gam fit data.csv 'y ~ smooth(x) + group(site)' --out model.json
gam predict model.json new_data.csv --uncertainty
gam report model.json data.csv

Pick whichever fits your workflow. The two share one engine, one formula DSL, and one on-disk format — train in the CLI, score in Python, or vice versa.

Install

Python. Wheels for Linux (x86_64, aarch64), macOS (Intel + Apple silicon), and Windows. No Rust toolchain required.

uv add gamfit
# or
pip install gamfit

Optional extras: gamfit[pandas], gamfit[plot], gamfit[sklearn], gamfit[all].

Rust CLI. One-liner installer for macOS, Linux, and Windows Git Bash:

curl -fsSL https://raw.githubusercontent.com/SauersML/gam/main/install.sh | bash

Or build from source: cargo build --release — the binary lands at ./target/release/gam.

What makes it different

These are the features you can't really stitch together out of existing GAM libraries.

Three-part penalty structure

Each smooth gets independent penalties on magnitude, gradient, and curvature. Most libraries collapse these into one (curvature only) or two — gamfit keeps them separate, so a flat-but-offset function and a wiggly function are penalized differently.

gamfit.fit(df, "z ~ duchon(pc1, pc2, pc3, pc4, centers=50)")

Adaptive per-axis anisotropy

Surface smooths can learn how much to shrink each axis independently. No more pretending that (latitude, age, log_income) deserves a single length-scale.

gamfit.fit(df, "z ~ matern(pc1, pc2, pc3, pc4)", scale_dimensions=True)

Side-by-side: isotropic global smoothing on the left, learned per-axis anisotropy on the right. Same data, same formula skeleton, very different population stability.

adaptive anisotropy: isotropic vs per-axis

Surface smooths in arbitrary dimension

P-spline, thin-plate, Matérn, and Duchon radial bases — the last with triple-operator regularization (mass + tension + stiffness) and scale-free behaviour by default. Mix kernels and scaling regimes freely.

gamfit.fit(df, "y ~ matern(x1, x2, x3, nu=5/2)")
gamfit.fit(df, "y ~ duchon(x1, x2, x3, x4, centers=80)")
gamfit.fit(df, "y ~ te(space, time, k=10)")   # tensor product

Flexible / learnable link functions

A spline offset on top of a base link lets the data correct for link misspecification. Or pick blended(logit, probit) for a learned mixture; or sas / beta-logistic for shape parameters learned from the data.

gamfit.fit(df, "case ~ s(age) + link(type=flexible(probit))"
                 " + linkwiggle(internal_knots=6)")

Marginal-slope models

For binary or survival outcomes with a calibrated risk score (e.g. a polygenic score), decouple baseline risk and score effect into separate formulas. The slope on the score becomes a smooth function of covariate space; the baseline can't absorb signal that belongs to it.

gamfit.fit(
    df,
    "case ~ matern(pc1, pc2, pc3)",
    family="bernoulli-marginal-slope",
    link="probit",
    z_column="pgs_z",
    logslope_formula="matern(pc1, pc2, pc3)",
)

Survival with on-demand surfaces

Surv(entry, exit, event) + four likelihood modes (transformation, Weibull, location-scale, marginal-slope) + a SurvivalPrediction object that evaluates S(t), h(t), H(t) on any time grid:

pred = model.predict(test_df)
S = pred.survival_at([1, 5, 10, 20])     # (n_rows, 4)
H = pred.cumulative_hazard_at([10])      # (n_rows, 1)

For population-scale cohorts, stream straight to CSV without materialising the full matrix: pred.write_survival_at_csv("surv.csv", times=[...]).

NUTS posteriors

model.sample(...) runs the No-U-Turn Sampler over the coefficient posterior conditional on the fitted smoothing parameters. Predictive bands stream in row chunks so memory stays bounded on large test sets.

posterior = model.sample(train, seed=42)
bands = posterior.predict(test, level=0.95)
# eta_mean, eta_lower, eta_upper, mean, mean_lower, mean_upper

Bounded coefficients with informative priors

Hard interval transforms with optional Beta priors — useful for proportions, mixing weights, or any coefficient that must live in [a, b]:

gamfit.fit(df,
    "y ~ age + bounded(prop, min=0, max=1, target=0.5, strength=3)")

scikit-learn drop-in

from gamfit.sklearn import GAMRegressor
est = GAMRegressor(formula="y ~ s(x)").fit(X, y)

How does it stack up

On the standard prostate benchmark (5-fold CV, binomial), gamfit's Rust engine reaches AUC 0.705 in 3.5 s fit time — sitting between mgcv (AUC 0.704, 4.4 s) and pygam (AUC 0.701, 9.8 s) for accuracy, faster than both. Benchmark machinery and scenarios live in bench/; aggregated results are in bench/prostate_gamair_results.json and friends.

Where to learn more

  • In-depth Python documentation: https://gamfit.readthedocs.io/ — getting started, the full formula DSL, families and links, survival, marginal-slope, posterior sampling, scikit-learn integration, a runnable cookbook, and an auto-generated API reference.
  • CLI help: gam <command> --help (commands: fit, predict, report, diagnose, sample, generate).
  • Cookbook of runnable recipes: see docs/cookbook.md.

Repository layout

Path Contents
src/ Rust engine: fitting, inference, smooth construction, survival, CLI.
crates/gam-pyffi/ PyO3 bindings (the gamfit._rust native extension).
gamfit/ Pure-Python public API on top of the bindings.
docs/ MkDocs/Material documentation sources (built to RTD).
tests/ Rust + Python integration tests.
bench/ Benchmark harness, scenario configs, datasets, plots.

Development

# Rust
cargo fmt --all
cargo clippy --all-targets --all-features -- -A warnings -D clippy::correctness -D clippy::suspicious
cargo test --all-features

# Python docs (uses uv)
uv venv --python 3.12 .venv-docs
uv pip install --python .venv-docs/bin/python -r docs/requirements.txt
.venv-docs/bin/mkdocs serve

Benchmark suite: python3 bench/run_suite.py --help.

Issues, feedback, contributions

Please open a GitHub issue with bug reports, feature requests, or questions — including "this doesn't work the way I expect."

License

AGPL-3.0-or-later. See LICENSE.