Expand description
GPU-ready sparse matrix formats and iterative solvers with data-oriented layouts.
Provides CSR, ELLPACK, Hybrid (ELL+COO), and Block-CSR formats along with iterative solvers (CG, BiCGSTAB, preconditioned CG) suitable for GPU offload.
Structs§
- Block
CsrMatrix - Block-CSR matrix where every stored entry is a
block_size × block_sizedense tile. - CsrMatrix
- Compressed Sparse Row (CSR) matrix stored as plain
f64arrays. - EllMatrix
- ELLPACK-format sparse matrix: rows padded to
max_nnz_per_row. - Hybrid
Matrix - Hybrid ELL+COO matrix: regular rows stored in ELL, overflow in COO.
- Sparse
Triplet - Coordinate (COO) format sparse matrix for incremental assembly.
Functions§
- assemble_
1d_ laplacian - Assemble a 1D Laplacian matrix of size
n × n(tridiagonal: 2 on diag, -1 off-diag). - axpy
y + alpha * x(AXPY).- bicgstab_
solve - BiCGSTAB solver for general (possibly non-symmetric) systems
A x = b. - cg_
solve - Conjugate Gradient solver for symmetric positive-definite systems
A x = b. - compute_
nnz_ per_ row - Compute the number of non-zeros per row for a CSR matrix.
- csr_
to_ ell - Convert a CSR matrix to ELLPACK format (convenience wrapper).
- dot
- Dot product of two vectors.
- extract_
diagonal - Extract the main diagonal of a CSR matrix.
- frobenius_
norm - Compute the Frobenius norm of a sparse matrix:
sqrt(sum(a_ij^2)). - jacobi_
preconditioned_ cg - Conjugate Gradient with Jacobi (diagonal) preconditioner for
A x = b(SPD). - norm2
- Euclidean norm of a vector.
- optimal_
ell_ row_ width - Choose the ELLPACK row width (max non-zeros per row) to minimize padding waste.
- scale_
vec - Scale every element of
xbys. - simulate_
spmv_ throughput - Estimate SpMV throughput in GFLOPS given matrix dimensions and nnz count.
- sparse_
lower_ triangular_ solve - Forward-substitution solve
L x = bwhereLis lower-triangular (CSR). - sparse_
upper_ triangular_ solve - Back-substitution solve
U x = bwhereUis upper-triangular (CSR). - spmv_
segmented - Segmented SpMV: processes each row independently to prepare for
GPU-style parallel execution. Functionally identical to
CsrMatrix::spmvbut structured for row-parallel dispatch.