pub trait RowJet<const K: usize>: Copy {
type Value: Copy;
Show 19 methods
// Required methods
fn constant(c: f64) -> Self;
fn variable(x: f64, slot: usize) -> Self;
fn values(&self) -> Self::Value;
fn add(&self, o: &Self) -> Self;
fn sub(&self, o: &Self) -> Self;
fn mul(&self, o: &Self) -> Self;
fn scale(&self, s: f64) -> Self;
fn compose_unary_with<const N: usize>(
&self,
stack_fn: impl Fn(f64) -> [f64; N],
) -> Self;
fn guard(&self, pred: impl Fn(f64) -> bool) -> GuardVerdict;
fn scale_rows(&self, s: Self::Value) -> Self;
fn pack_rows(rows: &[usize], value_of: impl Fn(usize) -> f64) -> Self::Value;
// Provided methods
fn neg(&self) -> Self { ... }
fn exp(&self) -> Self { ... }
fn ln(&self) -> Self { ... }
fn sqrt(&self) -> Self { ... }
fn recip(&self) -> Self { ... }
fn powf(&self, a: f64) -> Self { ... }
fn ln_gamma(&self) -> Self { ... }
fn digamma(&self) -> Self { ... }
}Expand description
The shared row-NLL algebra over BOTH the scalar jets and the f64x4 lane
towers — the bound that lets ONE single-source row-NLL body SIMD-batch 4
rows/pass without a dual-source copy (module §“The RowJet bridge”).
Every scalar crate::jet_scalar::JetScalar<K> is a RowJet<K> via the
blanket impl below (Value = f64), bit-identically to its JetScalar
methods; Tower3Lane / Tower4Lane over f64x4 are RowJet<K> with
Value = [f64; 4], routing through their per-lane methods so lane i of a
batched evaluation is to_bits-identical to the scalar evaluation on row i.
Required Associated Types§
Required Methods§
Sourcefn constant(c: f64) -> Self
fn constant(c: f64) -> Self
A constant (value c, all derivatives zero), broadcast to every lane.
Sourcefn variable(x: f64, slot: usize) -> Self
fn variable(x: f64, slot: usize) -> Self
The seeded primary slot at value x (unit first derivative in slot),
broadcast to every lane. Per-lane-DISTINCT seeding for the batch path is
done by the lane instantiators (generic_batched_fourth_tower /
generic_batched_third_tower), which build the tower variables directly
from each row’s primaries; this method is for any row-invariant auxiliary
variable a body introduces.
Sourcefn compose_unary_with<const N: usize>(
&self,
stack_fn: impl Fn(f64) -> [f64; N],
) -> Self
fn compose_unary_with<const N: usize>( &self, stack_fn: impl Fn(f64) -> [f64; N], ) -> Self
Faà di Bruno compose with a unary special function whose [f64; N]
derivative stack is built from the running base value PER LANE through
stack_fn. This is the SHARED-TRAIT version of the compose_unary_with
inherent method that already exists on both the scalar towers and the lane
towers: on a scalar jet stack_fn is run once at the value; on an f64x4
lane tower it is re-run per lane (the four rows carry four distinct base
values), so lane i is to_bits-identical to the scalar result on row i.
Making it a trait method is precisely what lets a body written once over
R: RowJet<K> instantiate at the batch towers. N is widened/narrowed to
the tower’s native width by [resize_stack] (N == 5 is a verbatim copy).
Sourcefn guard(&self, pred: impl Fn(f64) -> bool) -> GuardVerdict
fn guard(&self, pred: impl Fn(f64) -> bool) -> GuardVerdict
Per-lane domain guard: evaluate pred on each active lane’s value channel
and report which lanes failed (see GuardVerdict). A scalar jet checks
its one value; a lane tower checks all four. Lets a batched program detect
an out-of-domain row in a 4-group and bail that group to the scalar tail.
Sourcefn scale_rows(&self, s: Self::Value) -> Self
fn scale_rows(&self, s: Self::Value) -> Self
Per-lane scale: multiply every channel by the per-lane factor s
(Self::Value). On a scalar jet Self::Value = f64, so this is exactly
scale and the scalar call sites stay BIT-IDENTICAL when
.scale(x) is rewritten to .scale_rows(x); on an f64x4 lane tower
Self::Value = [f64; 4] and lane i is multiplied by s[i]. This is the
primitive that lets a batched body carry CONTINUOUS per-row data — the
survival covariance_ones / z_sum / observation-weight wi factors that
enter the jet algebra as .scale(per_row_value) and that the single-f64
scale would broadcast wrongly across the four rows. Build
s from the lane→row map with pack_rows.
Sourcefn pack_rows(rows: &[usize], value_of: impl Fn(usize) -> f64) -> Self::Value
fn pack_rows(rows: &[usize], value_of: impl Fn(usize) -> f64) -> Self::Value
Gather a per-lane auxiliary datum from the lane→row map rows: value_of(r)
is evaluated for each active lane’s row and packed into Self::Value (a
single f64 on a scalar jet, [f64; 4] on an f64x4 lane tower). This is
how a body written once over RowJet feeds per-row CONTINUOUS data (the
arguments to scale_rows) into the batch path without
knowing the concrete representation: the program holds the per-row data and
the caller threads rows (length 1 scalar, length 4 batch) into
RowNllProgramRowJet::row_nll, so the body writes
x.scale_rows(R::pack_rows(rows, |r| self.cov(r))). A multiplicative weight
buried in a compose_unary_with stack is pulled out the same way:
x.compose_unary_with(|u| stack(u, 1.0)).scale_rows(R::pack_rows(rows, |r| self.wi(r))).
(Binary per-row branches such as the event indicator di are kept
lane-uniform by grouping and the guard bail, not packed.)
Provided Methods§
Sourcefn neg(&self) -> Self
fn neg(&self) -> Self
Negate every channel. Defaults to scale(-1.0); the blanket overrides it
to delegate to crate::jet_scalar::JetScalar::neg.
Dyn Compatibility§
This trait is not dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety".
Implementors§
Source§impl<const K: usize, S: JetScalar<K>> RowJet<K> for S
Blanket: every scalar crate::jet_scalar::JetScalar<K> is a RowJet<K>
with Value = f64. Each op delegates to the identical JetScalar method, so
the existing scalar call sites compile UNCHANGED and bit-identically — the
bridge adds the lane representation without churning the scalar path. (The
concrete lane impls below cannot overlap this: Tower3Lane / Tower4Lane
are local types that do not implement JetScalar, and the orphan rule forbids
any downstream impl, so the coherence checker proves the impls disjoint.)
impl<const K: usize, S: JetScalar<K>> RowJet<K> for S
Blanket: every scalar crate::jet_scalar::JetScalar<K> is a RowJet<K>
with Value = f64. Each op delegates to the identical JetScalar method, so
the existing scalar call sites compile UNCHANGED and bit-identically — the
bridge adds the lane representation without churning the scalar path. (The
concrete lane impls below cannot overlap this: Tower3Lane / Tower4Lane
are local types that do not implement JetScalar, and the orphan rule forbids
any downstream impl, so the coherence checker proves the impls disjoint.)
Source§impl<const K: usize> RowJet<K> for Tower3Lane<f64x4, K>
The f64x4 lane Tower3Lane is a RowJet<K> with Value = [f64; 4],
the order-≤3 sibling of the Tower4Lane impl. A body that uses N == 5
stacks drops the (unused) fourth-derivative entry here, matching the scalar
Tower3 which also carries only up to the third tensor.
impl<const K: usize> RowJet<K> for Tower3Lane<f64x4, K>
The f64x4 lane Tower3Lane is a RowJet<K> with Value = [f64; 4],
the order-≤3 sibling of the Tower4Lane impl. A body that uses N == 5
stacks drops the (unused) fourth-derivative entry here, matching the scalar
Tower3 which also carries only up to the third tensor.
Source§impl<const K: usize> RowJet<K> for Tower4Lane<f64x4, K>
The f64x4 lane Tower4Lane is a RowJet<K> with Value = [f64; 4],
routing each op through its existing per-lane method. Lane i of a batched
evaluation is to_bits-identical to the scalar Tower4 evaluation on row
i (the per-lane methods are term-for-term lifts of the scalar tower).
impl<const K: usize> RowJet<K> for Tower4Lane<f64x4, K>
The f64x4 lane Tower4Lane is a RowJet<K> with Value = [f64; 4],
routing each op through its existing per-lane method. Lane i of a batched
evaluation is to_bits-identical to the scalar Tower4 evaluation on row
i (the per-lane methods are term-for-term lifts of the scalar tower).