Function cusparseSpMMOp

Source

pub unsafe extern "C" fn cusparseSpMMOp(
    plan: cusparseSpMMOpPlan_t,
    externalBuffer: *mut c_void,
) -> cusparseStatus_t

Expand description

NOTE 1: NVRTC and nvJitLink are not currently available on Arm64 Android platforms.

NOTE 2: The routine does not support Android and Tegra platforms except Judy (sm87).

Experimental: The function performs the multiplication of a sparse matrix matA and a dense matrix matB with custom operators.

where

op(A) is a sparse matrix of size $m \times k$
op(B) is a dense matrix of size $k \times n$
C is a dense matrix of size $m \times n$
$\oplus$, $\otimes$, and $\text{epilogue}$ are custom add, mul, and epilogue operators respectively.

Also, for matrix A and B: $$ \operatorname{op}(A) = \begin{cases} A & \text{if } op(A) = \text{CUSPARSE_OPERATION_NON_TRANSPOSE} \ A^T & \text{if } op(A) = \text{CUSPARSE_OPERATION_TRANSPOSE} \end{cases} $$:

$$ \operatorname{op}(B) = \begin{cases} B & \text{if } op(B) = \text{CUSPARSE_OPERATION_NON_TRANSPOSE} \ B^T & \text{if } op(B) = \text{CUSPARSE_OPERATION_TRANSPOSE} \end{cases} $$

Only opA == CUSPARSE_OPERATION_NON_TRANSPOSE is currently supported

The function cusparseSpMMOp_createPlan() returns the size of the workspace and the compiled kernel needed by cusparseSpMMOp()

The operators must have the following signature and return type

<computetype> is one of float, double, cuComplex, cuDoubleComplex, or int,

cusparseSpMMOp supports the following sparse matrix formats:

cusparseFormat_t::CUSPARSE_FORMAT_CSR

cusparseSpMMOp supports the following index type for representing the sparse matrix matA:

32-bit indices (cusparseIndexType_t::CUSPARSE_INDEX_32I)
64-bit indices (cusparseIndexType_t::CUSPARSE_INDEX_64I)

cusparseSpMMOp supports the following data types:

Uniform-precision computation:

`A`/`B`/ `C`/`computeType`
`cudaDataType_t::CUDA_R_32F`
`cudaDataType_t::CUDA_R_64F`
`cudaDataType_t::CUDA_C_32F`
`cudaDataType_t::CUDA_C_64F`

Mixed-precision computation:

`A`/`B`	`C`	`computeType`
`cudaDataType_t::CUDA_R_8I`	`cudaDataType_t::CUDA_R_32I`	`cudaDataType_t::CUDA_R_32I`
`cudaDataType_t::CUDA_R_8I`	`cudaDataType_t::CUDA_R_32F`	`cudaDataType_t::CUDA_R_32F`
`cudaDataType_t::CUDA_R_16F`
`cudaDataType_t::CUDA_R_16BF`
`cudaDataType_t::CUDA_R_16F`	`cudaDataType_t::CUDA_R_16F`
`cudaDataType_t::CUDA_R_16BF`	`cudaDataType_t::CUDA_R_16BF`

cusparseSpMMOp supports the following algorithms:

Algorithm	Notes
`CUSPARSE_SPMM_OP_ALG_DEFAULT`	Default algorithm for any sparse matrix format

Performance notes:

Row-major layout provides higher performance than column-major.

cusparseSpMMOp() has the following properties:

The routine requires extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run
The routine allows the indices of matA to be unsorted

cusparseSpMMOp() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

Please visit cuSPARSE Library Samples - cusparseSpMMOp.

cusparseSpMMOp

Function cusparseSpMMOp Copy item path

Function cusparseSpMMOp