Skip to main content

Module sparse_gpu

Module sparse_gpu 

Source
Expand description

GPU-ready sparse matrix formats and iterative solvers with data-oriented layouts.

Provides CSR, ELLPACK, Hybrid (ELL+COO), and Block-CSR formats along with iterative solvers (CG, BiCGSTAB, preconditioned CG) suitable for GPU offload.

Structs§

BlockCsrMatrix
Block-CSR matrix where every stored entry is a block_size × block_size dense tile.
CsrMatrix
Compressed Sparse Row (CSR) matrix stored as plain f64 arrays.
EllMatrix
ELLPACK-format sparse matrix: rows padded to max_nnz_per_row.
HybridMatrix
Hybrid ELL+COO matrix: regular rows stored in ELL, overflow in COO.
SparseTriplet
Coordinate (COO) format sparse matrix for incremental assembly.

Functions§

assemble_1d_laplacian
Assemble a 1D Laplacian matrix of size n × n (tridiagonal: 2 on diag, -1 off-diag).
axpy
y + alpha * x (AXPY).
bicgstab_solve
BiCGSTAB solver for general (possibly non-symmetric) systems A x = b.
cg_solve
Conjugate Gradient solver for symmetric positive-definite systems A x = b.
compute_nnz_per_row
Compute the number of non-zeros per row for a CSR matrix.
csr_to_ell
Convert a CSR matrix to ELLPACK format (convenience wrapper).
dot
Dot product of two vectors.
extract_diagonal
Extract the main diagonal of a CSR matrix.
frobenius_norm
Compute the Frobenius norm of a sparse matrix: sqrt(sum(a_ij^2)).
jacobi_preconditioned_cg
Conjugate Gradient with Jacobi (diagonal) preconditioner for A x = b (SPD).
norm2
Euclidean norm of a vector.
optimal_ell_row_width
Choose the ELLPACK row width (max non-zeros per row) to minimize padding waste.
scale_vec
Scale every element of x by s.
simulate_spmv_throughput
Estimate SpMV throughput in GFLOPS given matrix dimensions and nnz count.
sparse_lower_triangular_solve
Forward-substitution solve L x = b where L is lower-triangular (CSR).
sparse_upper_triangular_solve
Back-substitution solve U x = b where U is upper-triangular (CSR).
spmv_segmented
Segmented SpMV: processes each row independently to prepare for GPU-style parallel execution. Functionally identical to CsrMatrix::spmv but structured for row-parallel dispatch.