Enum LinalgKind

Source

#[non_exhaustive]
#[repr(u16)]pub enum LinalgKind {
Show 16 variants    Cholesky = 0,
    Lu = 1,
    Qr = 2,
    Svd = 3,
    Inverse = 4,
    Eig = 5,
    Solve = 6,
    LeastSquares = 7,
    MatrixExp = 8,
    BatchedQr = 9,
    BatchedSvd = 10,
    Eigh = 11,
    BatchedSvda = 12,
    BatchedOrmqr = 13,
    BatchedQrMaterialize = 14,
    BatchedOrmqrWy = 15,
}

Expand description

Linear-algebra (dense) op discriminant — covers the cuSOLVER family shipped in Milestone 6.3.

Stored as u16 in crate::KernelSku::op when category == OpCategory::Linalg. Today the four canonical PyTorch / JAX dense linalg ops are wired:

Self::Cholesky — A = L · L^T (symmetric positive-definite). Batched via cusolverDnSpotrfBatched / cusolverDnDpotrfBatched.
Self::Lu — P · A = L · U. Batched via cusolverDnSgetrfBatched / cusolverDnDgetrfBatched.
Self::Qr — A = Q · R. cuSOLVER has no batched variant; 2-D only.
Self::Svd — A = U · diag(S) · V^T. cuSOLVER 2-D only.

Dtype coverage is f32 + f64 — cuSOLVER’s dense API does not support f16 / bf16 for these factorizations. Reserved variants (Inverse, Eig, Solve, LeastSquares, MatrixExp) follow in future milestones.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive

Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.

§

Cholesky = 0

Cholesky factorization A = L · L^T (lower) or A = U^T · U (upper). Input must be symmetric positive-definite.

§

Lu = 1

LU factorization with partial pivoting P · A = L · U. Returns the packed LU factors plus an i32 pivot vector.

§

Qr = 2

QR factorization A = Q · R. Computes full Q ([M, M]) and the upper-triangular R ([M, N]) via geqrf + ormqr.

§

Svd = 3

Singular value decomposition A = U · diag(S) · V^T. cuSOLVER 2-D only; full_matrices controls whether U/V^T are full ([M,M] / [N,N]) or thin ([M,K] / [K,N]) where K = min(M, N).

§

Inverse = 4

Matrix inverse A^{-1} via getrf + getrs over an identity RHS. Wired in Milestone 6.9.

§

Eig = 5

General (non-symmetric) eigen-decomposition A · v = λ · v. Wired via cusolverDnXgeev in Milestone 6.12. Always emits complex eigenvalues (and optional left / right complex eigenvectors).

§

Solve = 6

Linear solve A · X = B via getrf + getrs. Wired in Milestone 6.9.

§

LeastSquares = 7

Least-squares solve min ||A·x - b||² via cuSOLVER’s mixed-precision iterative-refinement _gels routine. Wired in Milestone 6.11.

§

MatrixExp = 8

Reserved — matrix exponential / matrix functions.

§

BatchedQr = 9

Batched QR factorization A_b = Q_b · R_b via cusolverDn*geqrfBatched. Wired in Milestone 6.11.

§

BatchedSvd = 10

Batched SVD via Jacobi cusolverDn*gesvdjBatched. Wired in Milestone 6.11.

§

Eigh = 11

Symmetric / Hermitian eigen-decomposition A · v = λ · v (real eigenvalues). Wired via cusolverDn{S,D}syevd / cusolverDn{C,Z}heevd in Milestone 6.12.

§

BatchedSvda = 12

Rectangular batched approximate-SVD via cuSOLVER’s gesvdaStridedBatched. Unlike Self::BatchedSvd (which is square-only Jacobi), this routine accepts arbitrary m × n per batch slot, uses element-strides between slots, and reports per- slot residual Frobenius norms to a host array. Wired in Milestone 6.15.

§

BatchedOrmqr = 13

Bespoke batched-ormqr — applies the implicit Q from a Self::BatchedQr packed output to a batch of matrices C, all slots fused into one CUDA launch. cuSOLVER’s ormqr is non-batched, so in the small-matrix regime where batched-QR is most useful the per-slot launch latency dominates; this bespoke kernel amortizes one launch over the whole batch. Side = Left, op ∈ {N, T} in the trailblazer (Right + complex variants deferred). Wired in Milestone 6.14.

§

BatchedQrMaterialize = 14

Bespoke “materialize dense Q and R from batched-geqrf packed output”. Tiny upper-triangle-copy kernel for R; identity-stage

Self::BatchedOrmqr for Q. Wired in Milestone 6.14 as the consumer of BatchedOrmqrPlan.

§

BatchedOrmqrWy = 15

WY-blocked batched-ormqr — applies the implicit Q (or Q^T) from a Self::BatchedQr packed output to a batch of matrices C at GEMM-rates by fusing groups of nb consecutive Householder reflectors into a block reflector (I - V·T·V^T) and applying it via three cuBLAS strided-batched GEMMs per block. Sibling to Self::BatchedOrmqr (the reflector-by-reflector GEMV-rates variant); callers pick by problem size — WY wins decisively for M, N > ~16, the reflector kernel wins for tiny inputs. Side = Left, op ∈ {N, T} in the trailblazer. Wired in Milestone 6.17.

LinalgKind

Enum LinalgKind Copy item path

Variants (Non-exhaustive)§

Cholesky = 0

Lu = 1

Qr = 2

Svd = 3

Inverse = 4

Eig = 5

Solve = 6

LeastSquares = 7

MatrixExp = 8

BatchedQr = 9

BatchedSvd = 10

Eigh = 11

BatchedSvda = 12

BatchedOrmqr = 13

BatchedQrMaterialize = 14

BatchedOrmqrWy = 15

Trait Implementations§

impl Clone for LinalgKind

fn clone(&self) -> LinalgKind

fn clone_from(&mut self, source: &Self)

impl Copy for LinalgKind

impl Debug for LinalgKind

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl Eq for LinalgKind

impl Hash for LinalgKind

fn hash<__H>(&self, state: &mut __H)where __H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)where H: Hasher, Self: Sized,

impl PartialEq for LinalgKind

fn eq(&self, other: &LinalgKind) -> bool

fn ne(&self, other: &Rhs) -> bool

impl StructuralPartialEq for LinalgKind

Auto Trait Implementations§

impl Freeze for LinalgKind

impl RefUnwindSafe for LinalgKind

impl Send for LinalgKind

impl Sync for LinalgKind

impl Unpin for LinalgKind

impl UnsafeUnpin for LinalgKind

impl UnwindSafe for LinalgKind

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Enum LinalgKind

fn hash<H>(&self, state: &mut H)
where __H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,