#[non_exhaustive]#[repr(u16)]pub enum LinalgKind {
Show 16 variants
Cholesky = 0,
Lu = 1,
Qr = 2,
Svd = 3,
Inverse = 4,
Eig = 5,
Solve = 6,
LeastSquares = 7,
MatrixExp = 8,
BatchedQr = 9,
BatchedSvd = 10,
Eigh = 11,
BatchedSvda = 12,
BatchedOrmqr = 13,
BatchedQrMaterialize = 14,
BatchedOrmqrWy = 15,
}Expand description
Linear-algebra (dense) op discriminant — covers the cuSOLVER family shipped in Milestone 6.3.
Stored as u16 in crate::KernelSku::op when
category == OpCategory::Linalg. Today the four canonical PyTorch /
JAX dense linalg ops are wired:
Self::Cholesky—A = L · L^T(symmetric positive-definite). Batched viacusolverDnSpotrfBatched/cusolverDnDpotrfBatched.Self::Lu—P · A = L · U. Batched viacusolverDnSgetrfBatched/cusolverDnDgetrfBatched.Self::Qr—A = Q · R. cuSOLVER has no batched variant; 2-D only.Self::Svd—A = U · diag(S) · V^T. cuSOLVER 2-D only.
Dtype coverage is f32 + f64 — cuSOLVER’s dense API does not
support f16 / bf16 for these factorizations. Reserved variants
(Inverse, Eig, Solve, LeastSquares, MatrixExp) follow in
future milestones.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
Cholesky = 0
Cholesky factorization A = L · L^T (lower) or A = U^T · U
(upper). Input must be symmetric positive-definite.
Lu = 1
LU factorization with partial pivoting P · A = L · U. Returns
the packed LU factors plus an i32 pivot vector.
Qr = 2
QR factorization A = Q · R. Computes full Q ([M, M]) and
the upper-triangular R ([M, N]) via geqrf + ormqr.
Svd = 3
Singular value decomposition A = U · diag(S) · V^T. cuSOLVER
2-D only; full_matrices controls whether U/V^T are full
([M,M] / [N,N]) or thin ([M,K] / [K,N]) where
K = min(M, N).
Inverse = 4
Matrix inverse A^{-1} via getrf + getrs over an identity
RHS. Wired in Milestone 6.9.
Eig = 5
General (non-symmetric) eigen-decomposition A · v = λ · v. Wired
via cusolverDnXgeev in Milestone 6.12. Always emits complex
eigenvalues (and optional left / right complex eigenvectors).
Solve = 6
Linear solve A · X = B via getrf + getrs. Wired in
Milestone 6.9.
LeastSquares = 7
Least-squares solve min ||A·x - b||² via cuSOLVER’s
mixed-precision iterative-refinement _gels routine. Wired in
Milestone 6.11.
MatrixExp = 8
Reserved — matrix exponential / matrix functions.
BatchedQr = 9
Batched QR factorization A_b = Q_b · R_b via
cusolverDn*geqrfBatched. Wired in Milestone 6.11.
BatchedSvd = 10
Batched SVD via Jacobi cusolverDn*gesvdjBatched. Wired in
Milestone 6.11.
Eigh = 11
Symmetric / Hermitian eigen-decomposition A · v = λ · v (real
eigenvalues). Wired via cusolverDn{S,D}syevd /
cusolverDn{C,Z}heevd in Milestone 6.12.
BatchedSvda = 12
Rectangular batched approximate-SVD via cuSOLVER’s
gesvdaStridedBatched. Unlike Self::BatchedSvd (which is
square-only Jacobi), this routine accepts arbitrary m × n per
batch slot, uses element-strides between slots, and reports per-
slot residual Frobenius norms to a host array. Wired in
Milestone 6.15.
BatchedOrmqr = 13
Bespoke batched-ormqr — applies the implicit Q from a
Self::BatchedQr packed output to a batch of matrices C,
all slots fused into one CUDA launch. cuSOLVER’s ormqr is
non-batched, so in the small-matrix regime where batched-QR is
most useful the per-slot launch latency dominates; this bespoke
kernel amortizes one launch over the whole batch. Side = Left,
op ∈ {N, T} in the trailblazer (Right + complex variants
deferred). Wired in Milestone 6.14.
BatchedQrMaterialize = 14
Bespoke “materialize dense Q and R from batched-geqrf packed
output”. Tiny upper-triangle-copy kernel for R; identity-stage
Self::BatchedOrmqrfor Q. Wired in Milestone 6.14 as the consumer ofBatchedOrmqrPlan.
BatchedOrmqrWy = 15
WY-blocked batched-ormqr — applies the implicit Q (or Q^T)
from a Self::BatchedQr packed output to a batch of matrices
C at GEMM-rates by fusing groups of nb consecutive Householder
reflectors into a block reflector (I - V·T·V^T) and applying it
via three cuBLAS strided-batched GEMMs per block. Sibling to
Self::BatchedOrmqr (the reflector-by-reflector GEMV-rates
variant); callers pick by problem size — WY wins decisively for
M, N > ~16, the reflector kernel wins for tiny inputs.
Side = Left, op ∈ {N, T} in the trailblazer. Wired in
Milestone 6.17.
Trait Implementations§
Source§impl Clone for LinalgKind
impl Clone for LinalgKind
Source§fn clone(&self) -> LinalgKind
fn clone(&self) -> LinalgKind
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl Copy for LinalgKind
Source§impl Debug for LinalgKind
impl Debug for LinalgKind
impl Eq for LinalgKind
Source§impl Hash for LinalgKind
impl Hash for LinalgKind
Source§impl PartialEq for LinalgKind
impl PartialEq for LinalgKind
Source§fn eq(&self, other: &LinalgKind) -> bool
fn eq(&self, other: &LinalgKind) -> bool
self and other values to be equal, and is used by ==.