pub unsafe extern "C" fn cusparseScsrgeam2(
handle: cusparseHandle_t,
m: c_int,
n: c_int,
alpha: *const f32,
descrA: cusparseMatDescr_t,
nnzA: c_int,
csrSortedValA: *const f32,
csrSortedRowPtrA: *const c_int,
csrSortedColIndA: *const c_int,
beta: *const f32,
descrB: cusparseMatDescr_t,
nnzB: c_int,
csrSortedValB: *const f32,
csrSortedRowPtrB: *const c_int,
csrSortedColIndB: *const c_int,
descrC: cusparseMatDescr_t,
csrSortedValC: *mut f32,
csrSortedRowPtrC: *mut c_int,
csrSortedColIndC: *mut c_int,
pBuffer: *mut c_void,
) -> cusparseStatus_tExpand description
This function performs following matrix-matrix operation
where A, B, and C are $m \times n$ sparse matrices (defined in CSR storage format by the three arrays csrValA|csrValB|csrValC, csrRowPtrA|csrRowPtrB|csrRowPtrC, and csrColIndA|csrColIndB|csrcolIndC respectively), and $\alpha\text{and}\beta$ are scalars. Since A and B have different sparsity patterns, cuSPARSE adopts a two-step approach to complete sparse matrix C. In the first step, the user allocates csrRowPtrC of m+1 elements and uses function cusparseXcsrgeam2Nnz to determine csrRowPtrC and the total number of nonzero elements. In the second step, the user gathers nnzC (number of nonzero elements of matrix C) from either (nnzC=*nnzTotalDevHostPtr) or (nnzC=csrRowPtrC(m)-csrRowPtrC(0)) and allocates csrValC, csrColIndC of nnzC elements respectively, then finally calls function cusparse\[S|D|C|Z\]csrgeam2() to complete matrix C.
The general procedure is as follows:
Several comments on csrgeam2():
- The other three combinations, NT, TN, and TT, are not supported by cuSPARSE. In order to do any one of the three, the user should use the routine
csr2csc()to convert $A$ | $B$ to $A^{T}$ | $B^{T}$. - Only
cusparseMatrixType_t::CUSPARSE_MATRIX_TYPE_GENERALis supported. If eitherAorBis symmetric or Hermitian, then the user must extend the matrix to a full one and reconfigure theMatrixTypefield of the descriptor tocusparseMatrixType_t::CUSPARSE_MATRIX_TYPE_GENERAL. - If the sparsity pattern of matrix
Cis known, the user can skip the call to functioncusparseXcsrgeam2Nnz. For example, suppose that the user has an iterative algorithm which would updateAandBiteratively but keep the sparsity patterns. The user can call functioncusparseXcsrgeam2Nnzonce to set up the sparsity pattern ofC, then call functioncusparse\[S|D|C|Z\]geam()only for each iteration. - The pointers
alphaandbetamust be valid. - When
alphaorbetais zero, it is not considered a special case by cuSPARSE. The sparsity pattern ofCis independent of the value ofalphaandbeta. If the user wants $C = 0 \times A + 1 \times B^{T}$, thencsr2csc()is better thancsrgeam2(). csrgeam2()is the same ascsrgeam()exceptcsrgeam2()needs explicit buffer wherecsrgeam()allocates the buffer internally.- This function requires temporary extra storage that is allocated internally.
- The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available.
- The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available.