Expand description
Taylor-jet tower algebra: write each family’s row log-likelihood ONCE,
derive the entire RowKernel<K> derivative tower mechanically (#932).
§The object
Tower4<K> is a truncated multivariate Taylor scalar in K primary
variables, carrying the value and ALL partial derivatives through fourth
order as full (unsymmetrized) tensors:
v ℓ
g[a] ∂ℓ/∂p_a
h[a][b] ∂²ℓ/∂p_a∂p_b
t3[abc] ∂³ℓ/∂p_a∂p_b∂p_c
t4[abcd] ∂⁴ℓ/∂p_a∂p_b∂p_c∂p_dArithmetic (+ − × ÷, scalar mixes) propagates the tower by the exact
Leibniz rule; unary transcendentals propagate by the exact multivariate
Faà di Bruno formula given a [f, f′, f″, f‴, f⁗] stack evaluated at the
inner value. This is truncated Taylor ALGEBRA — exact derivatives of the
evaluated expression, not finite differences, not an approximation —
fully compatible with the exact-REML-only policy.
One evaluation of a row NLL program at seeded variables yields, in a
single pass, every channel the [super::row_kernel::RowKernel] trait
demands: row_kernel (value/∇/H), row_third_contracted(dir) (contract
t3 with dir), and row_fourth_contracted(u, v) (contract t4 with
u and v). The directional cross-channels that hand-written towers
drop (#736’s residual gap) cannot be dropped here: there is no separate
“channel” to forget — every derivative of the one expression is carried.
§Why this exists (the bug genus)
Every family today hand-writes its tower: value in one function,
gradient in another, pdfthird_derivative/pdffourth_derivative,
entry/exit-specific cross blocks — thousands of lines of calculus that
drift. #736 was a sign flip in a hand-written cross-Hessian block,
invisible until a new consumer touched it; #948 is a derivative path
that is not the derivative of the evaluated row loss (clamped-μ
surrogate); the objective↔gradient desync class is the same disease at
the criterion level. A tower-derived kernel is exact-by-construction:
the value channel IS the production loss expression, so its derivative
channels cannot desync from it.
§Relation to jet_partitions::MultiDirJet
The tree already carries a directional jet (bitmask coefficients over
distinct seeded directions, heap-allocated, Bell-partition compose) used
inside the marginal-slope and latent-survival families. It answers “the
derivative along THESE specific directions” and must be re-seeded and
re-evaluated per direction tuple (e.g. 10 symmetric (a,b) pairs for a
K=4 fourth contraction). Tower4 answers ALL of them from one
evaluation: contraction happens AFTER differentiation, as plain linear
algebra on the stored tensors. Use MultiDirJet when you need a handful
of directions of a huge-K expression; use Tower4 when you need the
complete small-K tower — which is exactly the RowKernel<K≤4> shape.
The [f64; 5] unary-derivative stacks
(unary_derivatives_neglog_phi, …) are signature-compatible with
Tower4::compose_unary, so the families’ existing special-function
stacks are directly reusable.
§Stability discipline (why this is NOT autodiff)
Differentiating the primal code path inherits its instabilities: a jet
pushed through a naive ln(1 + e^η) is garbage in the saturated tail
even though the true derivative σ(η) is benign there. This module
therefore splits responsibility: humans own primitive stability,
the algebra owns combinatorics. Tail-critical special functions enter
a program ONLY as hand-certified [f64; 5] derivative stacks through
Tower4::compose_unary — the same stacks the families already write
(unary_derivatives_neglog_phi and friends, built on erfcx/log_ndtr) —
and the tower mechanizes only the Leibniz/Faà di Bruno composition,
which is where hand-written towers actually fail (#736 was a
composition sign flip, not a primitive error). Program authors must use
a stable primitive stack wherever the f64 production loss does; the
convenience methods (exp, ln, sqrt, …) are for expressions whose
arguments are tame by construction.
§Storage convention
Tensors are stored FULL, not symmetric-packed: t4 for K=4 is 256
doubles where 35 would do. This is deliberate clarity-over-speed for the
oracle role — indexing is trivially auditable, contraction loops are
obvious, and the redundancy is itself a checked invariant (the algebra
only ever writes symmetric values). Symmetric packing is a later,
profile-justified optimization behind the same API.
§Deployment ladder (#932)
- This module: the algebra + the program seam + the oracle.
- Universal oracle: every hand-written
RowKernelgains a CI test asserting channel-by-channel agreement with aRowNllProgramwritten once — seeverify_kernel_channels. This alone would have caught #736 at introduction. - Migrate error-dense / cold towers to
derived_row_kernelet al.; keep hand-tuned hot paths, now verified against the single-expression truth instead of being the only definition. - New families (#914/#916/#917 ZI/ordinal/expectile, #921’s location-
scale port) implement ONLY
RowNllProgramand get an exact fourth-order tower for the price of writing the likelihood.
Structs§
- Kernel
Channels - One row’s worth of hand-written kernel outputs, as claimed by a
RowKernelimplementation, packaged for verification against the tower truth. Plain data (no trait coupling) so any kernel — whatever its visibility — can be audited from its own test module. - Tower2
- Truncated SECOND-order multivariate Taylor scalar in
Kvariables. - Tower3
- Truncated THIRD-order multivariate Taylor scalar in
Kvariables. - Tower4
- Truncated fourth-order multivariate Taylor scalar in
Kvariables.
Traits§
- RowNll
Program - A family’s row negative log-likelihood written ONCE over tower scalars.
- RowNll
Program Generic - A family’s row negative log-likelihood written ONCE over the generic
crate::jet_scalar::JetScalarinterface, so the SAME expression can be re-instantiated at whatever order / representation a consumer needs (crate::jet_scalar::Order2for(v, g, H),crate::jet_scalar::OneSeedfor the contracted third,crate::jet_scalar::TwoSeedfor the contracted fourth, or the fullTower4for every channel at once).
Functions§
- cell_
moving_ boundary_ flux_ tower - The boundary-flux derivative tower of a single moving cell integral
∫_{z_L(θ)}^{z_R(θ)} B dz:Φ(z_R(θ)) − Φ(z_L(θ)), assembled from the two edge towers and the integrand stacks at each edge. The returned tower’s derivative channels are the EXACT moving-boundary contribution to every θ-derivative of the cell integral, to fourth order, with no term hand-omitted. AFixed(non-moving) edge passes az_edgewhose derivative channels are all zero, contributing nothing — matching the productionedge_vel = 0short-circuit. - cell_
moving_ boundary_ flux_ tower_ theta_ integrand - Two-edge cell version of
moving_limit_boundary_tower_theta_integrand: the exact boundary-flux tower of∫_{z_L(θ)}^{z_R(θ)} G(z;θ) dzwith a θ-dependent integrand,Φ(z_R;θ) − Φ(z_L;θ)minus the pure-θ parts at each frozen edge. AFixededge passes az_edgewith zero derivative channels, so itsfullandinteriorsubstitutions coincide and it contributes nothing — matching the productionedge_vel = 0short-circuit. - derived_
fourth_ contracted - Mechanically derived
row_fourth_contractedchannel. - derived_
row_ kernel - Mechanically derived
row_kernelchannel:(nll, ∇, H). - derived_
third_ contracted - Mechanically derived
row_third_contractedchannel. - digamma_
derivative_ stack - evaluate_
program - Evaluate a program’s full tower at the current primaries for one row.
- generic_
fourth_ contracted - Evaluate a generic program at the two-seed scalar
crate::jet_scalar::TwoSeed, returning the contracted fourthΣ_{cd} ℓ_{abcd} u_c v_d— therow_fourth_contracted(u, v)channel — WITHOUT materialising the denset4tensor. - generic_
full_ tower - Evaluate a generic program at the full dense
Tower4scalar, returning every channel(v, g, h, t3, t4)in one pass. Used where the UNCONTRACTED third / fourth tensors are needed (the BMS rigidthird_full/fourth_fullcaches): the dense tensors come from the SAMErow_nll_genericexpression the order-2 / contracted scalars consume, so there is a single source of truth across every channel. - generic_
row_ kernel - Evaluate a generic program at the value/gradient/Hessian scalar
crate::jet_scalar::Order2, returning(nll, ∇, H)— therow_kernelchannel — WITHOUT materialising any third / fourth tensor. - generic_
third_ contracted - Evaluate a generic program at the one-seed scalar
crate::jet_scalar::OneSeed, returning the contracted thirdΣ_c ℓ_{abc} dir_c— therow_third_contracted(dir)channel — WITHOUT materialising the denset3tensor. The contraction direction is folded INTO the differentiation by the nilpotent ε seeded withdir. - implicit_
solve - Solve the implicit relation
F(a(θ), θ) ≡ 0for the intercept towera(θ)over theKprimaries θ, given the constraint towerfwritten overK + 1variables (slot0is the intercepta, slots1..=Kare the primaries θ) evaluated at the SOLVED point — i.e.f.vis the constraint residual at(a₀, θ₀)(≈ 0 from the production Newton solve) anda0is that solved intercept value. - ln_
gamma_ derivative_ stack - moving_
limit_ boundary_ tower - The exact θ-derivative tower of a moving-LIMIT integral’s BOUNDARY
contribution: given the edge-position tower
z_edge(θ)over theKprimaries and the integrandBevaluated-and-differentiated at the edge value as the stackb_stack = [B(z₀), B′(z₀), B″(z₀), B‴(z₀)](z₀ = z_edge.v), returns the tower ofΦ(z_edge(θ))whereΦ′ = B. - moving_
limit_ boundary_ tower_ theta_ integrand - Moving-limit boundary tower for a θ-DEPENDENT integrand
G(z; θ). - substitute_
intercept - Substitute the intercept tower
a(θ)into slot0of a constraint written overK + 1variables, returning the composite tower over theKprimaries θ:G(θ) = f(a(θ), θ₁, …, θ_K). - trigamma_
derivative_ stack - unary_
derivatives_ log1mexp_ positive - Stable derivative stack for
log(1 - exp(-x)),x > 0, through fourth order. - unary_
derivatives_ normal_ logcdf - Stable derivative stack for
log Phi(x)through fourth order. - verify_
kernel_ channels - Channel-by-channel audit of a hand-written kernel against the
single-expression tower truth. Returns
Errnaming the first channel, index, claimed and true values on disagreement — designed as the body of the per-family CI oracle tests (#932 deployment step 2).