Struct ArrowBorderSolvePlan

Source

pub struct ArrowBorderSolvePlan {
    pub n: usize,
    pub k: usize,
    pub d: usize,
    pub cg_iters: usize,
    pub data_fit_rank: usize,
    pub dense_border_rank_deficient: bool,
    pub dense_direct_flops: u128,
    pub reduced_iterative_flops: u128,
    pub recommended: ArrowBorderStrategy,
    pub device_favorable: bool,
}

Expand description

Cost model + recommendation for the arrow-Schur border solve, a pure function of the joint-system shape (unit-testable, no device required).

This operationalises the measured #1017 finding that the full arrow-Schur Newton solve is dominated by the dense k × k border Cholesky (the on-device dense Direct solve was measured at ~0.94× — a slowdown — because the k³/3 factorization, not the GPU-favourable batched per-row work, is the bottleneck at LLM/SAE border widths). The lever the issue calls for is to shrink or factor the dense border so the batched n-row work dominates; the plan makes that decision inspectable and honest.

§Flop model (deliberate, documented approximations)

Dense Direct ≈ 2·n·d·k² (assemble the reduced Schur: per row a rank-d symmetric update H_βt (H_tt)⁻¹ H_tβ to the k × k border, ≈ 2·d·k² flops) + k³/3 (Cholesky of the dense k × k Schur).
Reduced iterative ≈ cg_iters · n·(4·d·k + d²) (matrix-free PCG: per matvec a forward + transpose cross-block GEMV 4·d·k plus the per-row d × d solve d², summed over n row blocks, over cg_iters applies).

Both are dispatch-grade estimates, not exact operation counts; they omit preconditioner setup and lower-order terms symmetrically, so their ratio (the only thing the recommendation consumes) is meaningful while neither figure should be reused for speedup accounting.

§Status

Advisory / diagnostic. It is not wired into the live ArrowSolverMode::automatic selector: replacing the fixed DIRECT_SOLVE_MAX_K cut with this shape-driven crossover changes which production fits take the Direct vs PCG path and must be validated on GPU hardware (#1017 Phase 2–4) before it can change numerics. Today it is consumed by the honest examples/full_color_fit_1017.rs measurement harness (modeled-vs-measured) and by the unit tests below.

Fields§

§n: usize

Number of per-row blocks (SAE observations / latent rows).

§k: usize

Border β width (the SAE decoder atom count K × basis width).

§d: usize

Per-row latent / active-frame depth (the M dimension).

§cg_iters: usize

CG iteration budget assumed for the iterative estimate.

§data_fit_rank: usize

Effective rank of the data-fit contribution to the k × k border, bounded by Σ_i d_i ≈ n·d and never more than k.

§dense_border_rank_deficient: bool

True when n·d < k: the dense k × k Cholesky spends O(k³) factorising a border whose data information is only rank n·d — the pathological wide-sparse-border regime (color arm: n·d = 360 ≪ k = 15360).

§dense_direct_flops: u128

≈ 2·n·d·k² + k³/3 — reduced-Schur assembly plus dense border Cholesky.

§reduced_iterative_flops: u128

≈ cg_iters · n·(4·d·k + d²) — matrix-free PCG matvecs.

§recommended: ArrowBorderStrategy

The recommended strategy: ReducedIterative iff the dense factorization path costs strictly more arithmetic than the iterative path at cg_iters.

§device_favorable: bool

Whether running the recommended strategy on the device is expected to pay off. For ReducedIterative this is reduced_schur_matvec_should_offload; for DenseDirect the device wins only when the batched per-row assembly work (2·n·d·k², GPU-favourable batched GEMM/POTRF) at least matches the border Cholesky (k³/3) and clears the dense flop floor — the honest encoding of the measured 0.94× dense-Direct-on-device slowdown.

Struct ArrowBorderSolvePlan Copy item path

§Flop model (deliberate, documented approximations)

§Status

Fields§

Trait Implementations§

impl Clone for ArrowBorderSolvePlan

fn clone(&self) -> ArrowBorderSolvePlan

fn clone_from(&mut self, source: &Self)

impl Copy for ArrowBorderSolvePlan

impl Debug for ArrowBorderSolvePlan

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Eq for ArrowBorderSolvePlan

impl PartialEq for ArrowBorderSolvePlan

fn eq(&self, other: &ArrowBorderSolvePlan) -> bool

fn ne(&self, other: &Rhs) -> bool

impl StructuralPartialEq for ArrowBorderSolvePlan

Auto Trait Implementations§

impl Freeze for ArrowBorderSolvePlan

impl RefUnwindSafe for ArrowBorderSolvePlan

impl Send for ArrowBorderSolvePlan

impl Sync for ArrowBorderSolvePlan

impl Unpin for ArrowBorderSolvePlan

impl UnsafeUnpin for ArrowBorderSolvePlan

impl UnwindSafe for ArrowBorderSolvePlan

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Boilerplate for Twhere T: Copy + Send + Sync + Debug + PartialEq + 'static,

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> ByRef<T> for T

fn by_ref(&self) -> &T

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DTwhere ST: ?Sized, DT: ?Sized,

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DTwhere ST: ?Sized, DT: ?Sized,

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> DistributionExt for Twhere T: ?Sized,

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> Twhere Self: Distribution<T>,

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Imply<T> for Uwhere T: ?Sized, U: ?Sized,

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T> Read<Exclusive, BecauseExclusive> for Twhere T: ?Sized,

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

Struct ArrowBorderSolvePlan

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Boilerplate for T
where T: Copy + Send + Sync + Debug + PartialEq + 'static,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T> DistributionExt for T
where T: ?Sized,

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,