Skip to main content

Module jet_tower

Module jet_tower 

Source
Expand description

Taylor-jet tower algebra: write each family’s row log-likelihood ONCE, derive the entire RowKernel<K> derivative tower mechanically (#932).

§The object

Tower4<K> is a truncated multivariate Taylor scalar in K primary variables, carrying the value and ALL partial derivatives through fourth order as full (unsymmetrized) tensors:

  v        ℓ
  g[a]     ∂ℓ/∂p_a
  h[a][b]  ∂²ℓ/∂p_a∂p_b
  t3[abc]  ∂³ℓ/∂p_a∂p_b∂p_c
  t4[abcd] ∂⁴ℓ/∂p_a∂p_b∂p_c∂p_d

Arithmetic (+ − × ÷, scalar mixes) propagates the tower by the exact Leibniz rule; unary transcendentals propagate by the exact multivariate Faà di Bruno formula given a [f, f′, f″, f‴, f⁗] stack evaluated at the inner value. This is truncated Taylor ALGEBRA — exact derivatives of the evaluated expression, not finite differences, not an approximation — fully compatible with the exact-REML-only policy.

One evaluation of a row NLL program at seeded variables yields, in a single pass, every channel the [super::row_kernel::RowKernel] trait demands: row_kernel (value/∇/H), row_third_contracted(dir) (contract t3 with dir), and row_fourth_contracted(u, v) (contract t4 with u and v). The directional cross-channels that hand-written towers drop (#736’s residual gap) cannot be dropped here: there is no separate “channel” to forget — every derivative of the one expression is carried.

§Why this exists (the bug genus)

Every family today hand-writes its tower: value in one function, gradient in another, pdfthird_derivative/pdffourth_derivative, entry/exit-specific cross blocks — thousands of lines of calculus that drift. #736 was a sign flip in a hand-written cross-Hessian block, invisible until a new consumer touched it; #948 is a derivative path that is not the derivative of the evaluated row loss (clamped-μ surrogate); the objective↔gradient desync class is the same disease at the criterion level. A tower-derived kernel is exact-by-construction: the value channel IS the production loss expression, so its derivative channels cannot desync from it.

§Relation to jet_partitions::MultiDirJet

The tree already carries a directional jet (bitmask coefficients over distinct seeded directions, heap-allocated, Bell-partition compose) used inside the marginal-slope and latent-survival families. It answers “the derivative along THESE specific directions” and must be re-seeded and re-evaluated per direction tuple (e.g. 10 symmetric (a,b) pairs for a K=4 fourth contraction). Tower4 answers ALL of them from one evaluation: contraction happens AFTER differentiation, as plain linear algebra on the stored tensors. Use MultiDirJet when you need a handful of directions of a huge-K expression; use Tower4 when you need the complete small-K tower — which is exactly the RowKernel<K≤4> shape. The [f64; 5] unary-derivative stacks (unary_derivatives_neglog_phi, …) are signature-compatible with Tower4::compose_unary, so the families’ existing special-function stacks are directly reusable.

§Stability discipline (why this is NOT autodiff)

Differentiating the primal code path inherits its instabilities: a jet pushed through a naive ln(1 + e^η) is garbage in the saturated tail even though the true derivative σ(η) is benign there. This module therefore splits responsibility: humans own primitive stability, the algebra owns combinatorics. Tail-critical special functions enter a program ONLY as hand-certified [f64; 5] derivative stacks through Tower4::compose_unary — the same stacks the families already write (unary_derivatives_neglog_phi and friends, built on erfcx/log_ndtr) — and the tower mechanizes only the Leibniz/Faà di Bruno composition, which is where hand-written towers actually fail (#736 was a composition sign flip, not a primitive error). Program authors must use a stable primitive stack wherever the f64 production loss does; the convenience methods (exp, ln, sqrt, …) are for expressions whose arguments are tame by construction.

§Storage convention

Tensors are stored FULL, not symmetric-packed: t4 for K=4 is 256 doubles where 35 would do. This is deliberate clarity-over-speed for the oracle role — indexing is trivially auditable, contraction loops are obvious, and the redundancy is itself a checked invariant (the algebra only ever writes symmetric values). Symmetric packing is a later, profile-justified optimization behind the same API.

§Deployment ladder (#932)

  1. This module: the algebra + the program seam + the oracle.
  2. Universal oracle: every hand-written RowKernel gains a CI test asserting channel-by-channel agreement with a RowNllProgram written once — see verify_kernel_channels. This alone would have caught #736 at introduction.
  3. Migrate error-dense / cold towers to derived_row_kernel et al.; keep hand-tuned hot paths, now verified against the single-expression truth instead of being the only definition.
  4. New families (#914/#916/#917 ZI/ordinal/expectile, #921’s location- scale port) implement ONLY RowNllProgram and get an exact fourth-order tower for the price of writing the likelihood.

Structs§

GuardVerdict
The verdict of a per-lane RowJet::guard domain check.
KernelChannels
One row’s worth of hand-written kernel outputs, as claimed by a RowKernel implementation, packaged for verification against the tower truth. Plain data (no trait coupling) so any kernel — whatever its visibility — can be audited from its own test module.
Tower2
Truncated SECOND-order multivariate Taylor scalar in K variables.
Tower3
Truncated THIRD-order multivariate Taylor scalar in K variables.
Tower4
Truncated fourth-order multivariate Taylor scalar in K variables.
Tower3Lane
Lane-batched Tower3 (order-≤3 sibling of Tower4Lane).
Tower4Lane
Lane-batched Tower4: value / gradient / Hessian / 3rd / 4th tensors carried in a SIMD field L. Tower4Lane<f64x4, K> lane i is to_bits-identical to Tower4<K> on row i.

Traits§

RowJet
The shared row-NLL algebra over BOTH the scalar jets and the f64x4 lane towers — the bound that lets ONE single-source row-NLL body SIMD-batch 4 rows/pass without a dual-source copy (module §“The RowJet bridge”).
RowNllProgram
A family’s row negative log-likelihood written ONCE over tower scalars.
RowNllProgramGeneric
A family’s row negative log-likelihood written ONCE over the generic crate::jet_scalar::JetScalar interface, so the SAME expression can be re-instantiated at whatever order / representation a consumer needs (crate::jet_scalar::Order2 for (v, g, H), crate::jet_scalar::OneSeed for the contracted third, crate::jet_scalar::TwoSeed for the contracted fourth, or the full Tower4 for every channel at once).
RowNllProgramRowJet
A family’s row negative log-likelihood written ONCE over the RowJet bridge, so the SAME body instantiates at the scalar jets (for the (v, g, H) and contracted-tensor channels) AND at the f64x4 lane towers (for the 4-rows-per-pass SIMD batch). This is the lane-capable successor to RowNllProgramGeneric: a body written here gets the scalar channels through rowjet_row_kernel / rowjet_third_contracted / rowjet_fourth_contracted and the batched channels through generic_batched_fourth_tower / generic_batched_third_tower, all from a single source.

Functions§

cell_moving_boundary_flux_tower
The boundary-flux derivative tower of a single moving cell integral ∫_{z_L(θ)}^{z_R(θ)} B dz: Φ(z_R(θ)) − Φ(z_L(θ)), assembled from the two edge towers and the integrand stacks at each edge. The returned tower’s derivative channels are the EXACT moving-boundary contribution to every θ-derivative of the cell integral, to fourth order, with no term hand-omitted. A Fixed (non-moving) edge passes a z_edge whose derivative channels are all zero, contributing nothing — matching the production edge_vel = 0 short-circuit.
cell_moving_boundary_flux_tower_theta_integrand
Two-edge cell version of moving_limit_boundary_tower_theta_integrand: the exact boundary-flux tower of ∫_{z_L(θ)}^{z_R(θ)} G(z;θ) dz with a θ-dependent integrand, Φ(z_R;θ) − Φ(z_L;θ) minus the pure-θ parts at each frozen edge. A Fixed edge passes a z_edge with zero derivative channels, so its full and interior substitutions coincide and it contributes nothing — matching the production edge_vel = 0 short-circuit.
derived_fourth_contracted
Mechanically derived row_fourth_contracted channel.
derived_row_kernel
Mechanically derived row_kernel channel: (nll, ∇, H).
derived_third_contracted
Mechanically derived row_third_contracted channel.
digamma_derivative_stack
evaluate_program
Evaluate a program’s full tower at the current primaries for one row.
generic_batched_fourth_tower
Evaluate a RowNllProgramRowJet at the f64x4 lane Tower4Batch, computing the FULL (v, g, H, t3, t4) for FOUR rows in one SIMD pass — the lane twin of generic_full_tower. Each of the four lanes is seeded with its own row’s primaries, so Tower4Batch::lane(i) is to_bits-identical to the scalar generic_full_tower on rows[i].
generic_batched_third_tower
Evaluate a RowNllProgramRowJet at the f64x4 lane Tower3Batch, computing (v, g, H, t3) for FOUR rows in one SIMD pass — the order-≤3 lane twin of generic_full_tower. Tower3Batch::lane(i) is to_bits-identical to the order-≤3 scalar evaluation on rows[i].
generic_fourth_contracted
Evaluate a generic program at the two-seed scalar crate::jet_scalar::TwoSeed, returning the contracted fourth Σ_{cd} ℓ_{abcd} u_c v_d — the row_fourth_contracted(u, v) channel — WITHOUT materialising the dense t4 tensor.
generic_full_tower
Evaluate a generic program at the full dense Tower4 scalar, returning every channel (v, g, h, t3, t4) in one pass. Used where the UNCONTRACTED third / fourth tensors are needed (the BMS rigid third_full / fourth_full caches): the dense tensors come from the SAME row_nll_generic expression the order-2 / contracted scalars consume, so there is a single source of truth across every channel.
generic_row_kernel
Evaluate a generic program at the value/gradient/Hessian scalar crate::jet_scalar::Order2, returning (nll, ∇, H) — the row_kernel channel — WITHOUT materialising any third / fourth tensor.
generic_third_contracted
Evaluate a generic program at the one-seed scalar crate::jet_scalar::OneSeed, returning the contracted third Σ_c ℓ_{abc} dir_c — the row_third_contracted(dir) channel — WITHOUT materialising the dense t3 tensor. The contraction direction is folded INTO the differentiation by the nilpotent ε seeded with dir.
implicit_solve
Solve the implicit relation F(a(θ), θ) ≡ 0 for the intercept tower a(θ) over the K primaries θ, given the constraint tower f written over K + 1 variables (slot 0 is the intercept a, slots 1..=K are the primaries θ) evaluated at the SOLVED point — i.e. f.v is the constraint residual at (a₀, θ₀) (≈ 0 from the production Newton solve) and a0 is that solved intercept value.
ln_gamma_derivative_stack
ln_gamma_derivative_stack_order2
moving_limit_boundary_tower
The exact θ-derivative tower of a moving-LIMIT integral’s BOUNDARY contribution: given the edge-position tower z_edge(θ) over the K primaries and the integrand B evaluated-and-differentiated at the edge value as the stack b_stack = [B(z₀), B′(z₀), B″(z₀), B‴(z₀)] (z₀ = z_edge.v), returns the tower of Φ(z_edge(θ)) where Φ′ = B.
moving_limit_boundary_tower_theta_integrand
Moving-limit boundary tower for a θ-DEPENDENT integrand G(z; θ).
rowjet_fourth_contracted
Evaluate a RowNllProgramRowJet at the two-seed scalar crate::jet_scalar::TwoSeed, returning the contracted fourth Σ_{cd} ℓ_{abcd} u_c v_d — the RowJet twin of generic_fourth_contracted.
rowjet_row_kernel
Evaluate a RowNllProgramRowJet at the value/gradient/Hessian scalar crate::jet_scalar::Order2 (the (v, g, H) inner-Newton channel) — the RowJet twin of generic_row_kernel.
rowjet_third_contracted
Evaluate a RowNllProgramRowJet at the one-seed scalar crate::jet_scalar::OneSeed, returning the contracted third Σ_c ℓ_{abc} dir_c — the RowJet twin of generic_third_contracted.
substitute_intercept
Substitute the intercept tower a(θ) into slot 0 of a constraint written over K + 1 variables, returning the composite tower over the K primaries θ: G(θ) = f(a(θ), θ₁, …, θ_K).
trigamma_derivative_stack
unary_derivatives_log1mexp_positive
Stable derivative stack for log(1 - exp(-x)), x > 0, through fourth order.
unary_derivatives_normal_logcdf
Stable derivative stack for log Phi(x) through fourth order.
verify_kernel_channels
Channel-by-channel audit of a hand-written kernel against the single-expression tower truth. Returns Err naming the first channel, index, claimed and true values on disagreement — designed as the body of the per-family CI oracle tests (#932 deployment step 2).

Type Aliases§

Tower3Batch
The 4-rows-per-pass batched Tower3 (wide::f64x4 lanes).
Tower4Batch
The 4-rows-per-pass batched Tower4 (wide::f64x4 lanes).