pub struct Tower2<const K: usize> {
pub v: f64,
pub g: [f64; K],
pub h: [[f64; K]; K],
}Expand description
Truncated SECOND-order multivariate Taylor scalar in K variables.
This is the value/gradient/Hessian-only sibling of Tower4. Every
channel it carries (v, g, h) is computed by the SAME formulas
Tower4 uses for those orders, so for any program written over both
towers the order-≤2 outputs are bit-identical: the order-2 Leibniz and
Faà-di-Bruno terms read only the order-≤2 channels of their inputs (see
Tower4::mul / Tower4::compose_unary — out.h never touches t3
or t4), so dropping the third/fourth tensors cannot perturb the value,
gradient, or Hessian.
It exists purely for performance: an inner Newton step (and the
value-only ρ-homotopy pre-warm) needs at most curvature, never the
outer-κ/ψ third/fourth derivatives. Evaluating a row likelihood over
Tower2 skips the K⁴ fourth-tensor product/composition arithmetic that
dominates the cold marginal-slope fit, while returning the exact same
(v, g, h).
Fields§
§v: f64Value ℓ.
g: [f64; K]Gradient ∂ℓ/∂p_a.
h: [[f64; K]; K]Hessian ∂²ℓ/∂p_a∂p_b (symmetric).
Implementations§
Source§impl<const K: usize> Tower2<K>
impl<const K: usize> Tower2<K>
Sourcepub fn variable(value: f64, idx: usize) -> Self
pub fn variable(value: f64, idx: usize) -> Self
The seeded variable p_idx with current value value:
unit first derivative in slot idx, zero elsewhere and above.
Sourcepub fn mul(&self, o: &Self) -> Self
pub fn mul(&self, o: &Self) -> Self
Exact truncated (order ≤ 2) Leibniz product. The v/g/h upper
triangle matches Tower4::mul term-for-term.
§Symmetry fast path
The order-≤2 Leibniz Hessian
h[i][j] = a.v·b.h[i][j] + a.g[i]·b.g[j] + a.g[j]·b.g[i] + a.h[i][j]·b.v
is symmetric under i ↔ j whenever the operand Hessians are — which they
always are: constant/variable seed a symmetric (zero) h, and
mul/compose_unary/add/scale each preserve symmetry, so the
invariant holds for every tower a row program can build. We therefore
compute only the upper triangle j ≥ i and mirror it into the lower
triangle. At the K = 9 survival width that is K(K+1)/2 = 45 four-product
entry evaluations instead of K² = 81, and the win is larger in wall-clock
because the 648-entry h spills at K = 9 — halving the expensive
stores/reloads roughly halves the kernel (measured ≈2× on a Tower2<9>
mul-and-read throughput microbench; the dominant mul under every packed
scalar bottoms out here).
The upper-triangle entries are BIT-IDENTICAL to the old rectangular form
(same term/accumulation order). The lower triangle now equals its mirror
exactly, where the rectangular form rounded h[i][j] and h[j][i]
independently (the two cross products accumulate in opposite order) and
left a ≤1-ulp asymmetry; mirroring removes it, so the result is exactly
symmetric — strictly closer to the true symmetric Hessian, not merely a
reordering. Dense-h consumers are all tolerance-gated (rel-tol ≥ 1e-11 ≫
1e-16); the f64/f64x4 lane oracle stays exact because
crate::jet_scalar::Order2Lane::mul mirrors term-for-term.
Sourcepub fn compose_unary(&self, d: [f64; 3]) -> Self
pub fn compose_unary(&self, d: [f64; 3]) -> Self
Exact (order ≤ 2) multivariate Faà di Bruno composition f ∘ self.
d = [f(u), f′(u), f″(u)] evaluated at u = self.v. The v/g/h
channels match Tower4::compose_unary term-for-term (which uses only
d[0..=2] for those orders), so this is a strict truncation, not an
approximation. The full-order [f64; 5] derivative stacks the families
already produce can be passed by slicing their first three entries.
§Codegen
Order-≤2 Faà di Bruno is a tiny closed form, so this evaluates it
directly instead of routing through the generic
jet_algebra::faa_di_bruno set-partition walker (recursion + per-block
closure dispatch). That matters because this is the kernel under EVERY
packed scalar — crate::jet_scalar::Order2 / OneSeed / TwoSeed
composition all bottom out here — so the straight-line form (whose inner
loops auto-vectorise to NEON/SSE 2-wide and which emits zero outlined
walker calls) lifts all of them at once.
The term and accumulation order is BIT-IDENTICAL to the walker it
replaces: each output channel mirrors the walker’s total = 0.0 start
(the explicit acc accumulator), so a signed-zero product collapses to
+0.0 exactly as total += prod does. Proven to_bits-identical on
v/g/h across K ∈ {2,3,4,9}, 5000 random inputs each (incl.
zeroed / sign-varied stacks). The order-≤2 walker partitions are:
g[i] = f′·u_i (single block {i})
h[i][j] = f′·u_ij + (f″·u_i)·u_j (blocks {ij} then {i}{j}),
with f′ = d[1], f″ = d[2], u_* = self.{g,h}.
Trait Implementations§
impl<const K: usize> Copy for Tower2<K>
Auto Trait Implementations§
impl<const K: usize> Freeze for Tower2<K>
impl<const K: usize> RefUnwindSafe for Tower2<K>
impl<const K: usize> Send for Tower2<K>
impl<const K: usize> Sync for Tower2<K>
impl<const K: usize> Unpin for Tower2<K>
impl<const K: usize> UnsafeUnpin for Tower2<K>
impl<const K: usize> UnwindSafe for Tower2<K>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.