pub struct Tower3<const K: usize> {
pub v: f64,
pub g: [f64; K],
pub h: [[f64; K]; K],
pub t3: [[[f64; K]; K]; K],
}Expand description
Truncated THIRD-order multivariate Taylor scalar in K variables.
The value/gradient/Hessian/third-derivative sibling of Tower4, standing
between Tower2 and Tower4. Every channel it carries (v, g, h,
t3) is computed by the SAME shared Leibniz / Faà-di-Bruno kernels
Tower4 uses for those orders, and the order-≤3 terms of those kernels
read only the order-≤3 channels of their inputs (the order-3 Faà-di-Bruno
partitions never reach the f⁗ stack slot or the inner t4 tensor — see
Tower4::compose_unary). So for any program written over both towers the
order-≤3 outputs are bit-identical: dropping the fourth tensor cannot
perturb the value, gradient, Hessian, or third derivatives.
It exists purely for performance, exactly like Tower2: a consumer that
needs up to third derivatives (the survival location-scale row kernel reads
g, the diagonal h, and the diagonal t3, but never t4) pays the
K³ third-tensor arithmetic but skips the K⁴ fourth-tensor
product/composition that otherwise dominates the per-row cost.
Fields§
§v: f64Value ℓ.
g: [f64; K]Gradient ∂ℓ/∂p_a.
h: [[f64; K]; K]Hessian ∂²ℓ/∂p_a∂p_b (symmetric).
t3: [[[f64; K]; K]; K]Third derivatives ∂³ℓ/∂p_a∂p_b∂p_c (fully symmetric).
Implementations§
Source§impl<const K: usize> Tower3<K>
impl<const K: usize> Tower3<K>
Sourcepub fn variable(value: f64, idx: usize) -> Self
pub fn variable(value: f64, idx: usize) -> Self
The seeded variable p_idx with current value value:
unit first derivative in slot idx, zero elsewhere and above.
Sourcepub fn mul(&self, o: &Self) -> Self
pub fn mul(&self, o: &Self) -> Self
Exact truncated (order ≤ 3) Leibniz product. The v/g/h/t3
channels match Tower4::mul term-for-term.
§Codegen
Straight-line per-entry subset sums instead of the
[jet_algebra::leibniz_product] walker — the order-≤3 sibling of
Tower4::mul (no t4). Loop nest unchanged, no unroll over K, no
code bloat; auto-vectorises. BIT-IDENTICAL: terms in the walker’s exact
subset order with an acc = 0.0 accumulator start (load-bearing for the
signed-zero leading product on exact-0.0 jet channels). Proven
to_bits-identical on v/g/h/t3 across K ∈ {2,3,4,9}, 5000
zero/sign-stressed inputs each (these channel formulas are exactly the
g/h/t3 of the Tower4::mul oracle, which passes that stress).
Sourcepub fn add(&self, o: &Self) -> Self
pub fn add(&self, o: &Self) -> Self
Ref-taking elementwise sum, the by-ref twin of the std::ops::Add
operator (which consumes by value). Mirrors the inherent mul/scale
API so a chain like a.mul(&b).add(&c) reads uniformly without moving
out of the borrowed operands.
Sourcepub fn sub(&self, o: &Self) -> Self
pub fn sub(&self, o: &Self) -> Self
Ref-taking elementwise difference, the by-ref twin of std::ops::Sub.
Sourcepub fn compose_unary(&self, d: [f64; 4]) -> Self
pub fn compose_unary(&self, d: [f64; 4]) -> Self
Exact (order ≤ 3) multivariate Faà di Bruno composition f ∘ self.
d = [f(u), f′(u), f″(u), f‴(u)] evaluated at u = self.v. The
v/g/h/t3 channels match Tower4::compose_unary term-for-term
(which uses only d[0..=3] for those orders), so this is a strict
truncation, not an approximation. The full-order [f64; 5] derivative
stacks the families already produce can be passed by slicing their first
four entries.
§Codegen
Order-≤3 Faà di Bruno written as a compact closed form instead of the
recursive jet_algebra::faa_di_bruno walker — the order-≤2 sibling of
Tower4::compose_unary, one tensor order shallower. The loop nest is
unchanged (no unroll over K, no code bloat: measured on a Tower3<9>
compose-and-read consumer the new form is faster and SMALLER — asm: 71
walker bl calls → 0, 39.5 KiB → 13.9 KiB, +197 NEON .2d ops).
BIT-IDENTICAL: terms in the walker’s exact partition order, left-
associated block products, acc = 0.0 accumulator start. Proven
to_bits-identical on v/g/h/t3 across K ∈ {2,3,4,9}, 5000
random inputs each.
Sourcepub fn compose_unary_with(&self, stack_fn: impl Fn(f64) -> [f64; 4]) -> Self
pub fn compose_unary_with(&self, stack_fn: impl Fn(f64) -> [f64; 4]) -> Self
Compose with a unary special-function whose [f64; 4] derivative STACK is
built from the base value through stack_fn — the scalar arm of the
generic-over-Lane compose seam (see
Tower3Lane::compose_unary_with). Evaluates stack_fn(self.v) ONCE and
forwards to Self::compose_unary, so it is BIT-IDENTICAL to the explicit
self.compose_unary(stack_fn(self.v)). The order-≤3 sibling of
Tower4::compose_unary_with.
Sourcepub fn compose_unary_single_slot(&self, d: [f64; 4], slot: usize) -> Self
pub fn compose_unary_single_slot(&self, d: [f64; 4], slot: usize) -> Self
Single-active-slot fast path for Self::compose_unary — the order-≤3
sibling of Tower4::compose_unary_single_slot. When self carries
derivative support only on the all-slot diagonal, every output channel
touching an axis ≠ slot collapses to the walker’s total = 0.0 start
(+0.0), so only v, g[slot], h[slot][slot], t3[slot]³ survive.
These four are computed as STRAIGHT-LINE accumulations, each in the EXACT
term order of Self::compose_unary’s diagonal (i = j = k = slot)
case (BIT-IDENTICAL to the full path on the diagonal); off-slot
channels stay at the zero-init +0.0 the full walk also yields (proven
to_bits across K ∈ {2,3,4,9}). This drops the recursive
set-partition walker the diagonal channels previously routed through,
recovering its measured ~5.9× regression at the K ∈ {2,3} BMS tower
widths. Caller guarantees the single-slot precondition; otherwise use
Self::compose_unary.
Trait Implementations§
impl<const K: usize> Copy for Tower3<K>
Source§impl<const K: usize> JetScalar<K> for Tower3<K>
The order-≤3 crate::jet_tower::Tower3 is also a JetScalar. It serves
consumers that read .t3 but never .t4, avoiding the fourth-tensor
product/composition work while preserving the lower channels
bit-for-bit against crate::jet_tower::Tower4.
impl<const K: usize> JetScalar<K> for Tower3<K>
The order-≤3 crate::jet_tower::Tower3 is also a JetScalar. It serves
consumers that read .t3 but never .t4, avoiding the fourth-tensor
product/composition work while preserving the lower channels
bit-for-bit against crate::jet_tower::Tower4.
Source§fn variable(x: f64, axis: usize) -> Self
fn variable(x: f64, axis: usize) -> Self
p_axis at value x: unit first derivative in slot
axis, all higher channels zero. (The nilpotent / cross channels of the
directional scalars are seeded zero — callers set ε/δ directions through
the scalar-specific OneSeed::seed_direction / TwoSeed::seed.)Source§fn compose_unary(&self, d: [f64; 5]) -> Self
fn compose_unary(&self, d: [f64; 5]) -> Self
f ∘ self, given the outer
derivative stack d = [f(u), f′(u), f″(u), f‴(u), f⁗(u)] at
u = self.value(). Read moreSource§fn compose_unary_with(&self, stack_fn: impl Fn(f64) -> [f64; 5]) -> Self
fn compose_unary_with(&self, stack_fn: impl Fn(f64) -> [f64; 5]) -> Self
stack_fn — the generic-over-Lane
seam that lets a single-sourced row program instantiate at BOTH the scalar
f64 jets and the SIMD f64x4 batch towers from ONE expression. Read moreSource§fn ln(&self) -> Self
fn ln(&self) -> Self
ln(self). Caller guarantees positivity. Same derivative stack
crate::jet_tower::Tower4::ln uses, so any program written over both
matches term-for-term.Source§fn powf(&self, a: f64) -> Self
fn powf(&self, a: f64) -> Self
self^a for real exponent a. Caller guarantees a positive base.
Mirrors crate::jet_tower::Tower4::powf (falling-factorial stack).Source§fn ln_gamma(&self) -> Self
fn ln_gamma(&self) -> Self
ln Γ(self). Caller guarantees a positive argument. Uses the SAME
hand-certified derivative stack crate::jet_tower::Tower4::ln_gamma
consumes (crate::jet_tower::ln_gamma_derivative_stack), so any
program written over both matches term-for-term.Source§fn digamma(&self) -> Self
fn digamma(&self) -> Self
ψ(self) = d/dx ln Γ(x) (digamma). Caller guarantees a positive
argument. Same hand-certified stack
crate::jet_tower::digamma_derivative_stack.Auto Trait Implementations§
impl<const K: usize> Freeze for Tower3<K>
impl<const K: usize> RefUnwindSafe for Tower3<K>
impl<const K: usize> Send for Tower3<K>
impl<const K: usize> Sync for Tower3<K>
impl<const K: usize> Unpin for Tower3<K>
impl<const K: usize> UnsafeUnpin for Tower3<K>
impl<const K: usize> UnwindSafe for Tower3<K>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<const K: usize, S> RowJet<K> for Swhere
S: JetScalar<K>,
impl<const K: usize, S> RowJet<K> for Swhere
S: JetScalar<K>,
Source§fn variable(x: f64, slot: usize) -> S
fn variable(x: f64, slot: usize) -> S
slot at value x (unit first derivative in slot),
broadcast to every lane. Per-lane-DISTINCT seeding for the batch path is
done by the lane instantiators (generic_batched_fourth_tower /
generic_batched_third_tower), which build the tower variables directly
from each row’s primaries; this method is for any row-invariant auxiliary
variable a body introduces.Source§fn neg(&self) -> S
fn neg(&self) -> S
scale(-1.0); the blanket overrides it
to delegate to crate::jet_scalar::JetScalar::neg.Source§fn compose_unary_with<const N: usize>(
&self,
stack_fn: impl Fn(f64) -> [f64; N],
) -> S
fn compose_unary_with<const N: usize>( &self, stack_fn: impl Fn(f64) -> [f64; N], ) -> S
[f64; N]
derivative stack is built from the running base value PER LANE through
stack_fn. This is the SHARED-TRAIT version of the compose_unary_with
inherent method that already exists on both the scalar towers and the lane
towers: on a scalar jet stack_fn is run once at the value; on an f64x4
lane tower it is re-run per lane (the four rows carry four distinct base
values), so lane i is to_bits-identical to the scalar result on row i.
Making it a trait method is precisely what lets a body written once over
R: RowJet<K> instantiate at the batch towers. N is widened/narrowed to
the tower’s native width by [resize_stack] (N == 5 is a verbatim copy).Source§fn guard(&self, pred: impl Fn(f64) -> bool) -> GuardVerdict
fn guard(&self, pred: impl Fn(f64) -> bool) -> GuardVerdict
pred on each active lane’s value channel
and report which lanes failed (see GuardVerdict). A scalar jet checks
its one value; a lane tower checks all four. Lets a batched program detect
an out-of-domain row in a 4-group and bail that group to the scalar tail.Source§fn scale_rows(&self, s: f64) -> S
fn scale_rows(&self, s: f64) -> S
s
(Self::Value). On a scalar jet Self::Value = f64, so this is exactly
scale and the scalar call sites stay BIT-IDENTICAL when
.scale(x) is rewritten to .scale_rows(x); on an f64x4 lane tower
Self::Value = [f64; 4] and lane i is multiplied by s[i]. This is the
primitive that lets a batched body carry CONTINUOUS per-row data — the
survival covariance_ones / z_sum / observation-weight wi factors that
enter the jet algebra as .scale(per_row_value) and that the single-f64
scale would broadcast wrongly across the four rows. Build
s from the lane→row map with pack_rows.Source§fn pack_rows(rows: &[usize], value_of: impl Fn(usize) -> f64) -> f64
fn pack_rows(rows: &[usize], value_of: impl Fn(usize) -> f64) -> f64
rows: value_of(r)
is evaluated for each active lane’s row and packed into Self::Value (a
single f64 on a scalar jet, [f64; 4] on an f64x4 lane tower). This is
how a body written once over RowJet feeds per-row CONTINUOUS data (the
arguments to scale_rows) into the batch path without
knowing the concrete representation: the program holds the per-row data and
the caller threads rows (length 1 scalar, length 4 batch) into
RowNllProgramRowJet::row_nll, so the body writes
x.scale_rows(R::pack_rows(rows, |r| self.cov(r))). A multiplicative weight
buried in a compose_unary_with stack is pulled out the same way:
x.compose_unary_with(|u| stack(u, 1.0)).scale_rows(R::pack_rows(rows, |r| self.wi(r))).
(Binary per-row branches such as the event indicator di are kept
lane-uniform by grouping and the guard bail, not packed.)Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.