pub struct Tower3Lane<L: Lane, const K: usize> {
pub v: L,
pub g: [L; K],
pub h: [[L; K]; K],
pub t3: [[[L; K]; K]; K],
}Expand description
Lane-batched Tower3 (order-≤3 sibling of Tower4Lane).
Fields§
§v: LValue channel.
g: [L; K]Gradient.
h: [[L; K]; K]Hessian.
t3: [[[L; K]; K]; K]Third tensor.
Implementations§
Source§impl<L: Lane, const K: usize> Tower3Lane<L, K>
impl<L: Lane, const K: usize> Tower3Lane<L, K>
Sourcepub fn scale(&self, s: f64) -> Self
pub fn scale(&self, s: f64) -> Self
Multiply every channel by the plain scalar s (mirrors Tower3::scale).
Sourcepub fn mul(&self, o: &Self) -> Self
pub fn mul(&self, o: &Self) -> Self
Leibniz product self · o, term-for-term lift of Tower3::mul.
Sourcepub fn compose_unary(&self, d: [L; 4]) -> Self
pub fn compose_unary(&self, d: [L; 4]) -> Self
Faà di Bruno composition f ∘ self, term-for-term lift of
Tower3::compose_unary. d = [f, f′, f″, f‴] packed per lane.
Sourcepub fn compose_unary_with(&self, stack_fn: impl Fn(f64) -> [f64; 4]) -> Self
pub fn compose_unary_with(&self, stack_fn: impl Fn(f64) -> [f64; 4]) -> Self
Compose with a unary special-function whose [f64; 4] derivative stack is
built from the base value through stack_fn, evaluated PER LANE — the
batch arm of the generic-over-Lane compose
seam (the SIMD twin of Tower3::compose_unary_with, order-≤3 sibling of
Tower4Lane::compose_unary_with). The scalar stack_fn is run once per
lane at that lane’s own base value (via Lane::unary_with) and packed
into [L; 4] for the existing per-lane Self::compose_unary, so lane
i is to_bits-identical to self.lane(i).compose_unary_with(stack_fn).
Sourcepub fn compose_unary_single_slot(&self, d: [L; 4], slot: usize) -> Self
pub fn compose_unary_single_slot(&self, d: [L; 4], slot: usize) -> Self
Single-active-slot fast path, term-for-term lift of
Tower3::compose_unary_single_slot.
Trait Implementations§
Source§impl<L: Clone + Lane, const K: usize> Clone for Tower3Lane<L, K>
impl<L: Clone + Lane, const K: usize> Clone for Tower3Lane<L, K>
Source§fn clone(&self) -> Tower3Lane<L, K>
fn clone(&self) -> Tower3Lane<L, K>
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl<L: Copy + Lane, const K: usize> Copy for Tower3Lane<L, K>
Source§impl<const K: usize> RowJet<K> for Tower3Lane<f64x4, K>
The f64x4 lane Tower3Lane is a RowJet<K> with Value = [f64; 4],
the order-≤3 sibling of the Tower4Lane impl. A body that uses N == 5
stacks drops the (unused) fourth-derivative entry here, matching the scalar
Tower3 which also carries only up to the third tensor.
impl<const K: usize> RowJet<K> for Tower3Lane<f64x4, K>
The f64x4 lane Tower3Lane is a RowJet<K> with Value = [f64; 4],
the order-≤3 sibling of the Tower4Lane impl. A body that uses N == 5
stacks drops the (unused) fourth-derivative entry here, matching the scalar
Tower3 which also carries only up to the third tensor.
Source§fn constant(c: f64) -> Self
fn constant(c: f64) -> Self
c, all derivatives zero), broadcast to every lane.Source§fn variable(x: f64, slot: usize) -> Self
fn variable(x: f64, slot: usize) -> Self
slot at value x (unit first derivative in slot),
broadcast to every lane. Per-lane-DISTINCT seeding for the batch path is
done by the lane instantiators (generic_batched_fourth_tower /
generic_batched_third_tower), which build the tower variables directly
from each row’s primaries; this method is for any row-invariant auxiliary
variable a body introduces.Source§fn compose_unary_with<const N: usize>(
&self,
stack_fn: impl Fn(f64) -> [f64; N],
) -> Self
fn compose_unary_with<const N: usize>( &self, stack_fn: impl Fn(f64) -> [f64; N], ) -> Self
[f64; N]
derivative stack is built from the running base value PER LANE through
stack_fn. This is the SHARED-TRAIT version of the compose_unary_with
inherent method that already exists on both the scalar towers and the lane
towers: on a scalar jet stack_fn is run once at the value; on an f64x4
lane tower it is re-run per lane (the four rows carry four distinct base
values), so lane i is to_bits-identical to the scalar result on row i.
Making it a trait method is precisely what lets a body written once over
R: RowJet<K> instantiate at the batch towers. N is widened/narrowed to
the tower’s native width by [resize_stack] (N == 5 is a verbatim copy).Source§fn guard(&self, pred: impl Fn(f64) -> bool) -> GuardVerdict
fn guard(&self, pred: impl Fn(f64) -> bool) -> GuardVerdict
pred on each active lane’s value channel
and report which lanes failed (see GuardVerdict). A scalar jet checks
its one value; a lane tower checks all four. Lets a batched program detect
an out-of-domain row in a 4-group and bail that group to the scalar tail.Source§fn scale_rows(&self, s: [f64; 4]) -> Self
fn scale_rows(&self, s: [f64; 4]) -> Self
s
(Self::Value). On a scalar jet Self::Value = f64, so this is exactly
scale and the scalar call sites stay BIT-IDENTICAL when
.scale(x) is rewritten to .scale_rows(x); on an f64x4 lane tower
Self::Value = [f64; 4] and lane i is multiplied by s[i]. This is the
primitive that lets a batched body carry CONTINUOUS per-row data — the
survival covariance_ones / z_sum / observation-weight wi factors that
enter the jet algebra as .scale(per_row_value) and that the single-f64
scale would broadcast wrongly across the four rows. Build
s from the lane→row map with pack_rows.Source§fn pack_rows(rows: &[usize], value_of: impl Fn(usize) -> f64) -> [f64; 4]
fn pack_rows(rows: &[usize], value_of: impl Fn(usize) -> f64) -> [f64; 4]
rows: value_of(r)
is evaluated for each active lane’s row and packed into Self::Value (a
single f64 on a scalar jet, [f64; 4] on an f64x4 lane tower). This is
how a body written once over RowJet feeds per-row CONTINUOUS data (the
arguments to scale_rows) into the batch path without
knowing the concrete representation: the program holds the per-row data and
the caller threads rows (length 1 scalar, length 4 batch) into
RowNllProgramRowJet::row_nll, so the body writes
x.scale_rows(R::pack_rows(rows, |r| self.cov(r))). A multiplicative weight
buried in a compose_unary_with stack is pulled out the same way:
x.compose_unary_with(|u| stack(u, 1.0)).scale_rows(R::pack_rows(rows, |r| self.wi(r))).
(Binary per-row branches such as the event indicator di are kept
lane-uniform by grouping and the guard bail, not packed.)Source§fn neg(&self) -> Self
fn neg(&self) -> Self
scale(-1.0); the blanket overrides it
to delegate to crate::jet_scalar::JetScalar::neg.Auto Trait Implementations§
impl<L, const K: usize> Freeze for Tower3Lane<L, K>where
L: Freeze,
impl<L, const K: usize> RefUnwindSafe for Tower3Lane<L, K>where
L: RefUnwindSafe,
impl<L, const K: usize> Send for Tower3Lane<L, K>where
L: Send,
impl<L, const K: usize> Sync for Tower3Lane<L, K>where
L: Sync,
impl<L, const K: usize> Unpin for Tower3Lane<L, K>where
L: Unpin,
impl<L, const K: usize> UnsafeUnpin for Tower3Lane<L, K>where
L: UnsafeUnpin,
impl<L, const K: usize> UnwindSafe for Tower3Lane<L, K>where
L: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.