Skip to main content

cusparseDcsrgeam2

Function cusparseDcsrgeam2 

Source
pub unsafe extern "C" fn cusparseDcsrgeam2(
    handle: cusparseHandle_t,
    m: c_int,
    n: c_int,
    alpha: *const f64,
    descrA: cusparseMatDescr_t,
    nnzA: c_int,
    csrSortedValA: *const f64,
    csrSortedRowPtrA: *const c_int,
    csrSortedColIndA: *const c_int,
    beta: *const f64,
    descrB: cusparseMatDescr_t,
    nnzB: c_int,
    csrSortedValB: *const f64,
    csrSortedRowPtrB: *const c_int,
    csrSortedColIndB: *const c_int,
    descrC: cusparseMatDescr_t,
    csrSortedValC: *mut f64,
    csrSortedRowPtrC: *mut c_int,
    csrSortedColIndC: *mut c_int,
    pBuffer: *mut c_void,
) -> cusparseStatus_t
Expand description

This function performs following matrix-matrix operation

where A, B, and C are $m \times n$ sparse matrices (defined in CSR storage format by the three arrays csrValA|csrValB|csrValC, csrRowPtrA|csrRowPtrB|csrRowPtrC, and csrColIndA|csrColIndB|csrcolIndC respectively), and $\alpha\text{and}\beta$ are scalars. Since A and B have different sparsity patterns, cuSPARSE adopts a two-step approach to complete sparse matrix C. In the first step, the user allocates csrRowPtrC of m+1 elements and uses function cusparseXcsrgeam2Nnz to determine csrRowPtrC and the total number of nonzero elements. In the second step, the user gathers nnzC (number of nonzero elements of matrix C) from either (nnzC=*nnzTotalDevHostPtr) or (nnzC=csrRowPtrC(m)-csrRowPtrC(0)) and allocates csrValC, csrColIndC of nnzC elements respectively, then finally calls function cusparse\[S|D|C|Z\]csrgeam2() to complete matrix C.

The general procedure is as follows:

Several comments on csrgeam2():

  • The other three combinations, NT, TN, and TT, are not supported by cuSPARSE. In order to do any one of the three, the user should use the routine csr2csc() to convert $A$ | $B$ to $A^{T}$ | $B^{T}$.
  • Only cusparseMatrixType_t::CUSPARSE_MATRIX_TYPE_GENERAL is supported. If either A or B is symmetric or Hermitian, then the user must extend the matrix to a full one and reconfigure the MatrixType field of the descriptor to cusparseMatrixType_t::CUSPARSE_MATRIX_TYPE_GENERAL.
  • If the sparsity pattern of matrix C is known, the user can skip the call to function cusparseXcsrgeam2Nnz. For example, suppose that the user has an iterative algorithm which would update A and B iteratively but keep the sparsity patterns. The user can call function cusparseXcsrgeam2Nnz once to set up the sparsity pattern of C, then call function cusparse\[S|D|C|Z\]geam() only for each iteration.
  • The pointers alpha and beta must be valid.
  • When alpha or beta is zero, it is not considered a special case by cuSPARSE. The sparsity pattern of C is independent of the value of alpha and beta. If the user wants $C = 0 \times A + 1 \times B^{T}$, then csr2csc() is better than csrgeam2().
  • csrgeam2() is the same as csrgeam() except csrgeam2() needs explicit buffer where csrgeam() allocates the buffer internally.
  • This function requires temporary extra storage that is allocated internally.
  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available.
  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available.