pub unsafe extern "C" fn cusparseZbsrmm(
handle: cusparseHandle_t,
dirA: cusparseDirection_t,
transA: cusparseOperation_t,
transB: cusparseOperation_t,
mb: c_int,
n: c_int,
kb: c_int,
nnzb: c_int,
alpha: *const cuDoubleComplex,
descrA: cusparseMatDescr_t,
bsrSortedValA: *const cuDoubleComplex,
bsrSortedRowPtrA: *const c_int,
bsrSortedColIndA: *const c_int,
blockSize: c_int,
B: *const cuDoubleComplex,
ldb: c_int,
beta: *const cuDoubleComplex,
C: *mut cuDoubleComplex,
ldc: c_int,
) -> cusparseStatus_tExpand description
This function performs one of the following matrix-matrix operations:
A is an $mb \times kb$ sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA; B and C are dense matrices; $\alpha\text{and}\beta$ are scalars; and:
$$
\operatorname{op}(A) =
\begin{cases}
A & \text{if } transA = \text{CUSPARSE_OPERATION_NON_TRANSPOSE} \
A^T & \text{if } transA = \text{CUSPARSE_OPERATION_TRANSPOSE (not supported)} \
A^H & \text{if } transA = \text{CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)}
\end{cases}
$$
and: $$ \operatorname{op}(B) = \begin{cases} B & \text{if } transB = \text{CUSPARSE_OPERATION_NON_TRANSPOSE} \ B^T & \text{if } transB = \text{CUSPARSE_OPERATION_TRANSPOSE} \ B^H & \text{if } transB = \text{CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \end{cases} $$
The function has the following limitations:
- only
cusparseMatrixType_t::CUSPARSE_MATRIX_TYPE_GENERALmatrix type is supported - only
blockDim > 1is supported - if
blockDim≤ 4, then max(mb)/max(n) = 524,272 - if 4 <
blockDim≤ 8, then max(mb) = 524,272, max(n) = 262,136 - if
blockDim> 8, then m < 65,535 and max(n) = 262,136
The motivation of transpose(B) is to improve memory access of matrix B. The computational pattern of A*transpose(B) with matrix B in column-major order is equivalent to A*B with matrix B in row-major order.
In practice, no operation in an iterative solver or eigenvalue solver uses A*transpose(B). However, we can perform A*transpose(transpose(B)) which is the same as A*B. For example, suppose A is mb*kb, B is k*n and C is m*n, the following code shows usage of cusparseDbsrmm.
Instead of using A*B, our proposal is to transpose B to Bt by first calling cublas<t>geam(), and then to perform A*transpose(Bt).
bsrmm() has the following properties:
- The routine requires no extra storage.
- The routine supports asynchronous execution.
- The routine supports CUDA graph capture.