Skip to main content

ArrowBorderSolvePlan

Struct ArrowBorderSolvePlan 

Source
pub struct ArrowBorderSolvePlan {
    pub n: usize,
    pub k: usize,
    pub d: usize,
    pub cg_iters: usize,
    pub data_fit_rank: usize,
    pub dense_border_rank_deficient: bool,
    pub dense_direct_flops: u128,
    pub reduced_iterative_flops: u128,
    pub recommended: ArrowBorderStrategy,
    pub device_favorable: bool,
}
Expand description

Cost model + recommendation for the arrow-Schur border solve, a pure function of the joint-system shape (unit-testable, no device required).

This operationalises the measured #1017 finding that the full arrow-Schur Newton solve is dominated by the dense k × k border Cholesky (the on-device dense Direct solve was measured at ~0.94× — a slowdown — because the k³/3 factorization, not the GPU-favourable batched per-row work, is the bottleneck at LLM/SAE border widths). The lever the issue calls for is to shrink or factor the dense border so the batched n-row work dominates; the plan makes that decision inspectable and honest.

§Flop model (deliberate, documented approximations)

  • Dense Direct2·n·d·k² (assemble the reduced Schur: per row a rank-d symmetric update H_βt (H_tt)⁻¹ H_tβ to the k × k border, ≈ 2·d·k² flops) + k³/3 (Cholesky of the dense k × k Schur).
  • Reduced iterativecg_iters · n·(4·d·k + d²) (matrix-free PCG: per matvec a forward + transpose cross-block GEMV 4·d·k plus the per-row d × d solve , summed over n row blocks, over cg_iters applies).

Both are dispatch-grade estimates, not exact operation counts; they omit preconditioner setup and lower-order terms symmetrically, so their ratio (the only thing the recommendation consumes) is meaningful while neither figure should be reused for speedup accounting.

§Status

Advisory / diagnostic. It is not wired into the live ArrowSolverMode::automatic selector: replacing the fixed DIRECT_SOLVE_MAX_K cut with this shape-driven crossover changes which production fits take the Direct vs PCG path and must be validated on GPU hardware (#1017 Phase 2–4) before it can change numerics. Today it is consumed by the honest examples/full_color_fit_1017.rs measurement harness (modeled-vs-measured) and by the unit tests below.

Fields§

§n: usize

Number of per-row blocks (SAE observations / latent rows).

§k: usize

Border β width (the SAE decoder atom count K × basis width).

§d: usize

Per-row latent / active-frame depth (the M dimension).

§cg_iters: usize

CG iteration budget assumed for the iterative estimate.

§data_fit_rank: usize

Effective rank of the data-fit contribution to the k × k border, bounded by Σ_i d_i ≈ n·d and never more than k.

§dense_border_rank_deficient: bool

True when n·d < k: the dense k × k Cholesky spends O(k³) factorising a border whose data information is only rank n·d — the pathological wide-sparse-border regime (color arm: n·d = 360 ≪ k = 15360).

§dense_direct_flops: u128

≈ 2·n·d·k² + k³/3 — reduced-Schur assembly plus dense border Cholesky.

§reduced_iterative_flops: u128

≈ cg_iters · n·(4·d·k + d²) — matrix-free PCG matvecs.

§recommended: ArrowBorderStrategy

The recommended strategy: ReducedIterative iff the dense factorization path costs strictly more arithmetic than the iterative path at cg_iters.

§device_favorable: bool

Whether running the recommended strategy on the device is expected to pay off. For ReducedIterative this is reduced_schur_matvec_should_offload; for DenseDirect the device wins only when the batched per-row assembly work (2·n·d·k², GPU-favourable batched GEMM/POTRF) at least matches the border Cholesky (k³/3) and clears the dense flop floor — the honest encoding of the measured 0.94× dense-Direct-on-device slowdown.

Trait Implementations§

Source§

impl Clone for ArrowBorderSolvePlan

Source§

fn clone(&self) -> ArrowBorderSolvePlan

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Copy for ArrowBorderSolvePlan

Source§

impl Debug for ArrowBorderSolvePlan

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Eq for ArrowBorderSolvePlan

Source§

impl PartialEq for ArrowBorderSolvePlan

Source§

fn eq(&self, other: &ArrowBorderSolvePlan) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for ArrowBorderSolvePlan

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Boilerplate for T
where T: Copy + Send + Sync + Debug + PartialEq + 'static,

Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V