Module multinomial

Expand description

Penalized multinomial-logit (softmax) GLM driver — fixed-λ inner solve.

This is the principled vector-response companion to the scalar PIRLS path: the inner-loop Newton solver for a multi-class GAM at fixed smoothing parameters λ, using the canonical multinomial-logit likelihood (MultinomialLogitLikelihood) and the existing dense block-Fisher assembly in gam_solve::pirls::dense_block_xtwx / gam_solve::pirls::dense_block_xtwy.

§What this module does

Solve, for the reference-coded multinomial-logit GAM with K classes and design matrix X ∈ ℝ^{N×P},

    β̂ = argmin_β { − log L(β) + ½ Σ_{a=0}^{K-2} λ_a · β_a^T S β_a }

where β = [β_0; β_1; …; β_{K-2}] is the stacked coefficient vector in output-major order (β_a ∈ ℝ^P is the coefficient block for class a), S ∈ ℝ^{P×P} is the smoothing penalty matrix (shared across classes, replicated as I_{K-1} ⊗ S over the full parameter space), and λ_a is a per-class smoothing parameter.

The likelihood uses class K - 1 as the reference (η_{K-1} ≡ 0), so the softmax gauge is fixed at the η level and no additional sum-to-zero projection is required.

§Layering

Fixed-λ inner solve — fit_penalized_multinomial is the canonical coefficient-space Newton solver at given smoothing parameters λ, built on the shared crate::penalized_vector_glm engine.
REML / LAML smoothing-parameter selection — fit_penalized_multinomial_formula routes through crate::custom_family::fit_custom_family_with_rho_prior so the per-active-class λ_a are selected by the outer REML/LAML loop; the caller’s init_lambda is only a warm-start seed. The multinomial [crate::multinomial_reml::MultinomialFamily] CustomFamily impl calls the fixed-λ math above as its inner solve at each ρ trial and supplies the dense per-row Hessian block for the outer trace terms.
Formula → design integration — build_formula_design_for_multinomial parses the Wilkinson formula and assembles X and the per-term S blocks; the fit_multinomial_formula_pyfunc FFI shim wires the Python gamfit.fit(..., family='multinomial') entry straight to this path.

§Convergence

The damped-Newton-with-backtracking scaffold lives once in the shared crate::penalized_vector_glm engine: at each iteration the assembled penalized Hessian H + I_{K-1} ⊗ (λ_a S) is factored via faer’s symmetric-PD-with-fallback path, the full Newton step δ = −H^{-1} ∇F is computed, and accepted with step halving if the objective fails to decrease (up to a small backtracking budget). The convergence test is the relative coefficient step norm ‖δ‖ / (1 + ‖β‖) ≤ tol, matching the existing pyffi reference path. This module is the softmax adapter over that engine: it supplies the dense (K-1)×(K-1) Fisher block, the residual, and the log-likelihood through MultinomialLogitLikelihood, and owns the class-count / simplex preconditions. The independent-binomial sibling crate::binomial_multi is the same engine with a row-diagonal Fisher block instead.

Structs§

MultinomialFitInputs: Inputs to fit_penalized_multinomial.
MultinomialFitOutputs: Outputs of fit_penalized_multinomial.
MultinomialSavedModel: Saved-model payload for a multinomial fit driven by a Wilkinson formula.
MultinomialSmoothSignificance: One row of the multinomial smooth-significance table (#1101): the Wood rank-truncated Wald test for one (active class, smooth term) pair.
MultinomialSmoothTermSpan: One smooth term’s coefficient span within a class block, plus its unpenalized nullspace dimension and a display label (#1101). The Wald smooth-significance test in summary() slices the joint covariance / influence at a·P + col_start .. a·P + col_end for active class a.

Functions§

fit_penalized_multinomial: Fit a penalized multinomial-logit GAM at fixed λ.
fit_penalized_multinomial_formula: Top-level formula-driven multinomial fit.
predict_multinomial_formula: Replay the saved termspec to build the predict-time design on a fresh dataset, then evaluate softmax probabilities. The predict dataset must carry the same feature columns the training data did, matched by name — it need not reproduce the training column order, and in particular need not carry the response column (prediction is for label-free new data).
predict_multinomial_formula_with_se: Predict class probabilities AND delta-method per-class probability standard errors for a saved multinomial model on fresh data (#1101). Replays the saved termspec to build the predict design exactly as predict_multinomial_formula, then applies the softmax-Jacobian delta method against the stored joint posterior covariance. Returns (probs (N,K), prob_se (N,K) | None); prob_se is None for a legacy model fitted before covariance was surfaced.