Skip to main content

cusparseDbsrmm

Function cusparseDbsrmm 

Source
pub unsafe extern "C" fn cusparseDbsrmm(
    handle: cusparseHandle_t,
    dirA: cusparseDirection_t,
    transA: cusparseOperation_t,
    transB: cusparseOperation_t,
    mb: c_int,
    n: c_int,
    kb: c_int,
    nnzb: c_int,
    alpha: *const f64,
    descrA: cusparseMatDescr_t,
    bsrSortedValA: *const f64,
    bsrSortedRowPtrA: *const c_int,
    bsrSortedColIndA: *const c_int,
    blockSize: c_int,
    B: *const f64,
    ldb: c_int,
    beta: *const f64,
    C: *mut f64,
    ldc: c_int,
) -> cusparseStatus_t
Expand description

This function performs one of the following matrix-matrix operations:

A is an $mb \times kb$ sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA; B and C are dense matrices; $\alpha\text{and}\beta$ are scalars; and: $$ \operatorname{op}(A) = \begin{cases} A & \text{if } transA = \text{CUSPARSE_OPERATION_NON_TRANSPOSE} \ A^T & \text{if } transA = \text{CUSPARSE_OPERATION_TRANSPOSE (not supported)} \ A^H & \text{if } transA = \text{CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \end{cases} $$

and: $$ \operatorname{op}(B) = \begin{cases} B & \text{if } transB = \text{CUSPARSE_OPERATION_NON_TRANSPOSE} \ B^T & \text{if } transB = \text{CUSPARSE_OPERATION_TRANSPOSE} \ B^H & \text{if } transB = \text{CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \end{cases} $$

The function has the following limitations:

  • only cusparseMatrixType_t::CUSPARSE_MATRIX_TYPE_GENERAL matrix type is supported
  • only blockDim > 1 is supported
  • if blockDim ≤ 4, then max(mb)/max(n) = 524,272
  • if 4 < blockDim ≤ 8, then max(mb) = 524,272, max(n) = 262,136
  • if blockDim > 8, then m < 65,535 and max(n) = 262,136

The motivation of transpose(B) is to improve memory access of matrix B. The computational pattern of A*transpose(B) with matrix B in column-major order is equivalent to A*B with matrix B in row-major order.

In practice, no operation in an iterative solver or eigenvalue solver uses A*transpose(B). However, we can perform A*transpose(transpose(B)) which is the same as A*B. For example, suppose A is mb*kb, B is k*n and C is m*n, the following code shows usage of cusparseDbsrmm.

Instead of using A*B, our proposal is to transpose B to Bt by first calling cublas&lt;t>geam(), and then to perform A*transpose(Bt).

bsrmm() has the following properties:

  • The routine requires no extra storage.
  • The routine supports asynchronous execution.
  • The routine supports CUDA graph capture.