gam 0.3.21

Generalized penalized likelihood engine
Documentation
# gam · gamfit

[![PyPI](https://img.shields.io/pypi/v/gamfit.svg)](https://pypi.org/project/gamfit/)
[![Python](https://img.shields.io/pypi/pyversions/gamfit.svg)](https://pypi.org/project/gamfit/)
[![Docs](https://img.shields.io/readthedocs/gamfit.svg)](https://gamfit.readthedocs.io/)
[![Rust CI](https://github.com/SauersML/gam/actions/workflows/test.yml/badge.svg)](https://github.com/SauersML/gam/actions/workflows/test.yml)
[![License](https://img.shields.io/badge/license-AGPL--3.0--or--later-blue.svg)](LICENSE)

A formula-first generalized additive model engine. Written in Rust for
speed, with a polished Python library on top.

Fits Gaussian, binomial, Poisson, and Gamma GLMs with smooth terms,
random effects, bounded/constrained coefficients, location-scale
extensions, survival likelihoods, and flexible/learnable link functions.
Smoothing parameters are selected by REML or LAML. Posterior sampling
uses NUTS.

**Docs:** <https://gamfit.readthedocs.io/> &middot; **PyPI:**
<https://pypi.org/project/gamfit/>

## Two ways to use it

```python
# Python library (gamfit)
import gamfit
model = gamfit.fit(train, "y ~ s(x) + group(site)")
preds = model.predict(test, interval=0.95)
```

```bash
# Rust CLI (gam)
gam fit data.csv 'y ~ smooth(x) + group(site)' --out model.json
gam predict model.json new_data.csv --uncertainty
gam report model.json data.csv
```

Pick whichever fits your workflow. The two share one engine, one
formula DSL, and one on-disk format — train in the CLI, score in
Python, or vice versa.

## Install

**Python.** Wheels for Linux (x86_64, aarch64), macOS (Intel + Apple
silicon), and Windows. No Rust toolchain required.

```bash
uv add gamfit
# or
pip install gamfit
```

Optional extras: `gamfit[pandas]`, `gamfit[plot]`, `gamfit[sklearn]`,
`gamfit[all]`.

**Rust CLI.** One-liner installer for macOS, Linux, and Windows Git Bash:

```bash
curl -fsSL https://raw.githubusercontent.com/SauersML/gam/main/install.sh | bash
```

Or build from source: `cargo build --release` — the binary lands at
`./target/release/gam`.

## What makes it different

These are the features you can't really stitch together out of
existing GAM libraries.

### Three-part penalty structure

Each smooth gets independent penalties on **magnitude, gradient, and
curvature**. Most libraries collapse these into one (curvature only) or
two — gamfit keeps them separate, so a flat-but-offset function and a
wiggly function are penalized differently.

```python
gamfit.fit(df, "z ~ duchon(pc1, pc2, pc3, pc4, centers=50)")
```

### Adaptive per-axis anisotropy

Surface smooths can learn how much to shrink each axis independently.
No more pretending that `(latitude, age, log_income)` deserves a single
length-scale.

```python
gamfit.fit(df, "z ~ matern(pc1, pc2, pc3, pc4)", scale_dimensions=True)
```

Side-by-side: isotropic global smoothing on the left, learned per-axis
anisotropy on the right. Same data, same formula skeleton, very
different population stability.

![adaptive anisotropy: isotropic vs per-axis](bench/aniso_demo/02_normalized.png)

### Surface smooths in arbitrary dimension

P-spline, thin-plate, Matérn, and **Duchon** radial bases — the last
with triple-operator regularization (mass + tension + stiffness) and
scale-free behaviour by default. Mix kernels and scaling regimes
freely.

```python
gamfit.fit(df, "y ~ matern(x1, x2, x3, nu=5/2)")
gamfit.fit(df, "y ~ duchon(x1, x2, x3, x4, centers=80)")
gamfit.fit(df, "y ~ te(space, time, k=10)")   # tensor product
```

### Flexible / learnable link functions

A spline offset on top of a base link lets the data correct for link
misspecification. Or pick `blended(logit, probit)` for a learned
mixture; or `sas` / `beta-logistic` for shape parameters learned from
the data.

```python
gamfit.fit(df, "case ~ s(age) + link(type=flexible(probit))"
                 " + linkwiggle(internal_knots=6)")
```

### Marginal-slope models

For binary or survival outcomes with a calibrated risk score (e.g. a
polygenic score), decouple **baseline risk** and **score effect** into
separate formulas. The slope on the score becomes a smooth function of
covariate space; the baseline can't absorb signal that belongs to it.

```python
gamfit.fit(
    df,
    "case ~ matern(pc1, pc2, pc3)",
    family="bernoulli-marginal-slope",
    link="probit",
    z_column="pgs_z",
    logslope_formula="matern(pc1, pc2, pc3)",
)
```

### Survival with on-demand surfaces

`Surv(entry, exit, event)` + four likelihood modes (transformation,
Weibull, location-scale, marginal-slope) + a `SurvivalPrediction`
object that evaluates `S(t)`, `h(t)`, `H(t)` on any time grid:

```python
pred = model.predict(test_df)
S = pred.survival_at([1, 5, 10, 20])     # (n_rows, 4)
H = pred.cumulative_hazard_at([10])      # (n_rows, 1)
```

For population-scale cohorts, stream straight to CSV without
materialising the full matrix:
`pred.write_survival_at_csv("surv.csv", times=[...])`.

### NUTS posteriors

`model.sample(...)` runs the No-U-Turn Sampler over the coefficient
posterior conditional on the fitted smoothing parameters. Predictive
bands stream in row chunks so memory stays bounded on large
test sets.

```python
posterior = model.sample(train, seed=42)
bands = posterior.predict(test, level=0.95)
# eta_mean, eta_lower, eta_upper, mean, mean_lower, mean_upper
```

### Bounded coefficients with informative priors

Hard interval transforms with optional Beta priors — useful for
proportions, mixing weights, or any coefficient that *must* live in
`[a, b]`:

```python
gamfit.fit(df,
    "y ~ age + bounded(prop, min=0, max=1, target=0.5, strength=3)")
```

### scikit-learn drop-in

```python
from gamfit.sklearn import GAMRegressor
est = GAMRegressor(formula="y ~ s(x)").fit(X, y)
```

## How does it stack up

On the standard `prostate` benchmark (5-fold CV, binomial), gamfit's
Rust engine reaches **AUC 0.705** in **3.5 s** fit time — sitting
between `mgcv` (AUC 0.704, 4.4 s) and `pygam` (AUC 0.701, 9.8 s) for
accuracy, faster than both. Benchmark machinery and scenarios live in
[`bench/`](bench/); aggregated results are in
[`bench/prostate_gamair_results.json`](bench/prostate_gamair_results.json)
and friends.

## Where to learn more

- **In-depth Python documentation:** <https://gamfit.readthedocs.io/>
  &mdash; getting started, the full formula DSL, families and links,
  survival, marginal-slope, posterior sampling, scikit-learn
  integration, a runnable cookbook, and an auto-generated API
  reference.
- **CLI help:** `gam <command> --help` (commands: `fit`, `predict`,
  `report`, `diagnose`, `sample`, `generate`).
- **Cookbook of runnable recipes:** see
  [docs/cookbook.md]docs/cookbook.md.

## Repository layout

| Path | Contents |
| --- | --- |
| `src/` | Rust engine: fitting, inference, smooth construction, survival, CLI. |
| `crates/gam-pyffi/` | PyO3 bindings (the `gamfit._rust` native extension). |
| `gamfit/` | Pure-Python public API on top of the bindings. |
| `docs/` | MkDocs/Material documentation sources (built to RTD). |
| `tests/` | Rust + Python integration tests. |
| `bench/` | Benchmark harness, scenario configs, datasets, plots. |

## Development

```bash
# Rust
cargo fmt --all
cargo clippy --all-targets --all-features -- -A warnings -D clippy::correctness -D clippy::suspicious
cargo test --all-features

# Python docs (uses uv)
uv venv --python 3.12 .venv-docs
uv pip install --python .venv-docs/bin/python -r docs/requirements.txt
.venv-docs/bin/mkdocs serve
```

Benchmark suite: `python3 bench/run_suite.py --help`.

## Issues, feedback, contributions

Please open a [GitHub issue](https://github.com/SauersML/gam/issues)
with bug reports, feature requests, or questions — including "this
doesn't work the way I expect."

## License

AGPL-3.0-or-later. See [LICENSE](LICENSE).