Function cusparseSDDMM

Source

pub unsafe extern "C" fn cusparseSDDMM(
    handle: cusparseHandle_t,
    opA: cusparseOperation_t,
    opB: cusparseOperation_t,
    alpha: *const c_void,
    matA: cusparseConstDnMatDescr_t,
    matB: cusparseConstDnMatDescr_t,
    beta: *const c_void,
    matC: cusparseSpMatDescr_t,
    computeType: cudaDataType,
    alg: cusparseSDDMMAlg_t,
    externalBuffer: *mut c_void,
) -> cusparseStatus_t

Expand description

This function performs the multiplication of matA and matB, followed by an element-wise multiplication with the sparsity pattern of matC. Formally, it performs the following operation:

where

op(A) is a dense matrix of size $m \times k$
op(B) is a dense matrix of size $k \times n$
C is a sparse matrix of size $m \times n$
$\alpha$ and $\beta$ are scalars
$\circ$ denotes the Hadamard (entry-wise) matrix product, and ${spy}\left( \mathbf{C} \right)$ is the structural sparsity pattern matrix of C defined as:

Also, for matrix A and B

The function cusparseSDDMM_bufferSize returns the size of the workspace needed by cusparseSDDMM or cusparseSDDMM_preprocess.

Calling cusparseSDDMM_preprocess is optional. It may accelerate subsequent calls to cusparseSDDMM. It is useful when cusparseSDDMM is called multiple times with the same sparsity pattern (matC).

Calling cusparseSDDMM_preprocess with buffer makes that buffer “active” for matC SDDMM calls. Subsequent calls to cusparseSDDMM with matC and the active buffer must use the same values for all parameters as the call to cusparseSDDMM_preprocess. The exceptions are: alpha, beta, matA, matB, and the values (but not indices) of matC may be different. Importantly, the buffer contents must be unmodified since the call to cusparseSDDMM_preprocess. When cusparseSDDMM is called with matC and its active buffer, it may read acceleration data from the buffer.

Calling cusparseSDDMM_preprocess again with matC and a new buffer will make the new buffer active, forgetting about the previously-active buffer and making it inactive. For cusparseSDDMM, there can only be one active buffer per sparse matrix at a time. To get the effect of multiple active buffers for a single sparse matrix, create multiple matrix handles that all point to the same index and value buffers, and call cusparseSDDMM_preprocess once per handle with different workspace buffers.

Calling cusparseSDDMM with an inactive buffer is always permitted. However, there may be no acceleration from the preprocessing in that case.

For the purposes of thread safety, cusparseSDDMM_preprocess is writing to matC internal state.

Currently supported sparse matrix formats:

cusparseSDDMM supports the following index type for representing the sparse matrix matA:

32-bit indices (cusparseIndexType_t::CUSPARSE_INDEX_32I)
64-bit indices (cusparseIndexType_t::CUSPARSE_INDEX_64I)

The data types combinations currently supported for cusparseSDDMM are listed below:

Uniform-precision computation:

`A`/`X`/ `Y`/`computeType`
`cudaDataType_t::CUDA_R_32F`
`cudaDataType_t::CUDA_R_64F`
`cudaDataType_t::CUDA_C_32F`
`cudaDataType_t::CUDA_C_64F`

Mixed-precision computation:

`A`/`B`	`C`	`computeType`
`cudaDataType_t::CUDA_R_16F`	`cudaDataType_t::CUDA_R_32F`	`cudaDataType_t::CUDA_R_32F`
`cudaDataType_t::CUDA_R_16F`	`cudaDataType_t::CUDA_R_16F`

cusparseSDDMM for cusparseFormat_t::CUSPARSE_FORMAT_BSR also supports the following mixed-precision computation:

`A`/`B`	`C`	`computeType`
`cudaDataType_t::CUDA_R_16BF`	`cudaDataType_t::CUDA_R_32F`	`cudaDataType_t::CUDA_R_32F`
`cudaDataType_t::CUDA_R_16BF`	`cudaDataType_t::CUDA_R_16BF`

NOTE: cudaDataType_t::CUDA_R_16F, cudaDataType_t::CUDA_R_16BF data types always imply mixed-precision computation.

cusparseSDDMM for CUSPASRE_FORMAT_BSR supports block sizes of 2, 4, 8, 16, 32, 64 and 128.

cusparseSDDMM supports the following algorithms:

Algorithm	Notes
`cusparseSDDMMAlg_t::CUSPARSE_SDDMM_ALG_DEFAULT`	Default algorithm. It supports batched computation.

Performance notes: cusparseSDDMM for cusparseFormat_t::CUSPARSE_FORMAT_CSR provides the best performance when matA and matB satisfy:

matA:

matA is in row-major order and opA is cusparseOperation_t::CUSPARSE_OPERATION_NON_TRANSPOSE, or
matA is in col-major order and opA is not cusparseOperation_t::CUSPARSE_OPERATION_NON_TRANSPOSE

matB:

matB is in col-major order and opB is cusparseOperation_t::CUSPARSE_OPERATION_NON_TRANSPOSE, or
matB is in row-major order and opB is not cusparseOperation_t::CUSPARSE_OPERATION_NON_TRANSPOSE

cusparseSDDMM for cusparseFormat_t::CUSPARSE_FORMAT_BSR provides the best performance when matA and matB satisfy:

matA:

matA is in row-major order and opA is cusparseOperation_t::CUSPARSE_OPERATION_NON_TRANSPOSE, or
matA is in col-major order and opA is not cusparseOperation_t::CUSPARSE_OPERATION_NON_TRANSPOSE

matB:

matB is in row-major order and opB is cusparseOperation_t::CUSPARSE_OPERATION_NON_TRANSPOSE, or
matB is in col-major order and opB is not cusparseOperation_t::CUSPARSE_OPERATION_NON_TRANSPOSE

cusparseSDDMM supports the following batch modes:

$C_{i} = (A \cdot B) \circ C_{i}$
$C_{i} = \left( A_{i} \cdot B \right) \circ C_{i}$
$C_{i} = \left( A \cdot B_{i} \right) \circ C_{i}$
$C_{i} = \left( A_{i} \cdot B_{i} \right) \circ C_{i}$

The number of batches and their strides can be set by using cusparseCsrSetStridedBatch and cusparseDnMatSetStridedBatch. The maximum number of batches for cusparseSDDMM is 65,535.

cusparseSDDMM has the following properties:

The routine requires no extra storage
Provides deterministic (bit-wise) results for each run
The routine supports asynchronous execution
The routine allows the indices of matC to be unsorted

cusparseSDDMM supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

Please visit cuSPARSE Library Samples - cusparseSDDMM for a code example. For batched computation please visit cusparseSDDMM CSR Batched.

§Parameters

handle: Handle to the cuSPARSE library context.
opA: Operation op(A).
opB: Operation op(B).
alpha: $\alpha$ scalar used for multiplication of type computeType.
matA: Dense matrix matA.
matB: Dense matrix matB.
beta: $\beta$ scalar used for multiplication of type computeType.
matC: Sparse matrix matC.
computeType: Datatype in which the computation is executed.
alg: Algorithm for the computation.
externalBuffer: Pointer to a workspace buffer of at least bufferSize bytes.

cusparseSDDMM

Function cusparseSDDMM Copy item path

§Parameters

Function cusparseSDDMM