pub unsafe extern "C" fn cusparseZbsrmm(
handle: cusparseHandle_t,
dirA: cusparseDirection_t,
transA: cusparseOperation_t,
transB: cusparseOperation_t,
mb: c_int,
n: c_int,
kb: c_int,
nnzb: c_int,
alpha: *const cuDoubleComplex,
descrA: cusparseMatDescr_t,
bsrSortedValA: *const cuDoubleComplex,
bsrSortedRowPtrA: *const c_int,
bsrSortedColIndA: *const c_int,
blockSize: c_int,
B: *const cuDoubleComplex,
ldb: c_int,
beta: *const cuDoubleComplex,
C: *mut cuDoubleComplex,
ldc: c_int,
) -> cusparseStatus_tExpand description
This function performs one of the following matrix-matrix operations:
A is an $mb \times kb$ sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA; B and C are dense matrices; $\alpha\text{and}\beta$ are scalars; and

and

The function has the following limitations:
- only
cusparseMatrixType_t::CUSPARSE_MATRIX_TYPE_GENERALmatrix type is supported - only
blockDim > 1is supported - if
blockDim≤ 4, then max(mb)/max(n) = 524,272 - if 4 <
blockDim≤ 8, then max(mb) = 524,272, max(n) = 262,136 - if
blockDim> 8, then m < 65,535 and max(n) = 262,136
The motivation of transpose(B) is to improve memory access of matrix B. The computational pattern of A*transpose(B) with matrix B in column-major order is equivalent to A*B with matrix B in row-major order.
In practice, no operation in an iterative solver or eigenvalue solver uses A*transpose(B). However, we can perform A*transpose(transpose(B)) which is the same as A*B. For example, suppose A is mb*kb, B is k*n and C is m*n, the following code shows usage of cusparseDbsrmm.
Instead of using A*B, our proposal is to transpose B to Bt by first calling cublasZgeam(), and then to perform A*transpose(Bt).
bsrmm() has the following properties:
- The routine requires no extra storage.
- The routine supports asynchronous execution.
- The routine supports CUDA graph capture.