Function cusparseSpMV

Source

pub unsafe extern "C" fn cusparseSpMV(
    handle: cusparseHandle_t,
    opA: cusparseOperation_t,
    alpha: *const c_void,
    matA: cusparseConstSpMatDescr_t,
    vecX: cusparseConstDnVecDescr_t,
    beta: *const c_void,
    vecY: cusparseDnVecDescr_t,
    computeType: cudaDataType,
    alg: cusparseSpMVAlg_t,
    externalBuffer: *mut c_void,
) -> cusparseStatus_t

Expand description

This function performs the multiplication of a sparse matrix matA and a dense vector vecX

where

op(A) is a sparse matrix of size $m \times k$
X is a dense vector of size $k$
Y is a dense vector of size $m$
$\alpha$ and $\beta$ are scalars

Also, for matrix A: $$ \operatorname{op}(A) = \begin{cases} A & \text{if } op(A) = \text{CUSPARSE_OPERATION_NON_TRANSPOSE} \ A^T & \text{if } op(A) = \text{CUSPARSE_OPERATION_TRANSPOSE} \ A^H & \text{if } op(A) = \text{CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \end{cases} $$

The function cusparseSpMV_bufferSize returns the size of the workspace needed by cusparseSpMV_preprocess and cusparseSpMV

The sparse matrix formats currently supported are listed below:

cusparseFormat_t::CUSPARSE_FORMAT_COO
cusparseFormat_t::CUSPARSE_FORMAT_CSR
cusparseFormat_t::CUSPARSE_FORMAT_CSC
cusparseFormat_t::CUSPARSE_FORMAT_BSR
cusparseFormat_t::CUSPARSE_FORMAT_SLICED_ELL

cusparseSpMV supports the following index type for representing the sparse matrix matA:

32-bit indices (cusparseIndexType_t::CUSPARSE_INDEX_32I)
64-bit indices (cusparseIndexType_t::CUSPARSE_INDEX_64I)

cusparseSpMV supports the following data types:

Uniform-precision computation:

`A`/`X`/ `Y`/`computeType`
`cudaDataType_t::CUDA_R_32F`
`cudaDataType_t::CUDA_R_64F`
`cudaDataType_t::CUDA_C_32F`
`cudaDataType_t::CUDA_C_64F`

Mixed-precision computation:

`A`/`X`	`Y`	`computeType`
`cudaDataType_t::CUDA_R_8I`	`cudaDataType_t::CUDA_R_32I`	`cudaDataType_t::CUDA_R_32I`
`cudaDataType_t::CUDA_R_8I`	`cudaDataType_t::CUDA_R_32F`	`cudaDataType_t::CUDA_R_32F`
`cudaDataType_t::CUDA_R_16F`
`cudaDataType_t::CUDA_R_16BF`
`cudaDataType_t::CUDA_R_16F`	`cudaDataType_t::CUDA_R_16F`
`cudaDataType_t::CUDA_R_16BF`	`cudaDataType_t::CUDA_R_16BF`
`cudaDataType_t::CUDA_C_32F`	`cudaDataType_t::CUDA_C_32F`	`cudaDataType_t::CUDA_C_32F`
`cudaDataType_t::CUDA_C_16F`	`cudaDataType_t::CUDA_C_16F`	[DEPRECATED]
`cudaDataType_t::CUDA_C_16BF`	`cudaDataType_t::CUDA_C_16BF`	[DEPRECATED]

`A`	`X`/`Y`/`computeType`
`cudaDataType_t::CUDA_R_32F`	`cudaDataType_t::CUDA_R_64F`

Mixed Regular/Complex computation:

`A`	`X`/`Y`/`computeType`
`cudaDataType_t::CUDA_R_32F`	`cudaDataType_t::CUDA_C_32F`
`cudaDataType_t::CUDA_R_64F`	`cudaDataType_t::CUDA_C_64F`

NOTE: cudaDataType_t::CUDA_R_16F, cudaDataType_t::CUDA_R_16BF, cudaDataType_t::CUDA_C_16F, and cudaDataType_t::CUDA_C_16BF data types always imply mixed-precision computation.

cusparseSpMV supports the following algorithms:

Algorithm	Notes
`cusparseSpMVAlg_t::CUSPARSE_SPMV_ALG_DEFAULT`	Default algorithm for any sparse matrix format.
`cusparseSpMVAlg_t::CUSPARSE_SPMV_COO_ALG1`	Default algorithm for COO sparse matrix format. May produce slightly different results during different runs with the same input parameters.
`cusparseSpMVAlg_t::CUSPARSE_SPMV_COO_ALG2`	Provides deterministic (bit-wise) results for each run. If `opA != CUSPARSE_OPERATION_NON_TRANSPOSE`, it is identical to `cusparseSpMVAlg_t::CUSPARSE_SPMV_COO_ALG1`.
`cusparseSpMVAlg_t::CUSPARSE_SPMV_CSR_ALG1`	Default algorithm for CSR/CSC sparse matrix format. May produce slightly different results during different runs with the same input parameters.
`cusparseSpMVAlg_t::CUSPARSE_SPMV_CSR_ALG2`	Provides deterministic (bit-wise) results for each run. If `opA != CUSPARSE_OPERATION_NON_TRANSPOSE`, it is identical to `cusparseSpMVAlg_t::CUSPARSE_SPMV_CSR_ALG1`.
`cusparseSpMVAlg_t::CUSPARSE_SPMV_SELL_ALG1`	Default algorithm for Sliced Ellpack sparse matrix format. Provides deterministic (bit-wise) results for each run.
`cusparseSpMVAlg_t::CUSPARSE_SPMV_BSR_ALG1`	Default algorithm for BSR sparse matrix format. Provides deterministic (bit-wise) results for each run. Supports only `opA == CUSPARSE_OPERATION_NON_TRANSPOSE`. Supports both row-major and column-major block layouts in `A`.

Calling cusparseSpMV_preprocess is optional. It may accelerate subsequent calls to cusparseSpMV. It is useful when cusparseSpMV is called multiple times with the same sparsity pattern (matA).

Calling cusparseSpMV_preprocess with buffer makes that buffer “active” for matA SpMV calls. Subsequent calls to cusparseSpMV with matA and the active buffer must use the same values for all parameters as the call to cusparseSpMV_preprocess. The exceptions are: alpha, beta, vecX, vecY, and the values (but not indices) of matA may be different. Importantly, the buffer contents must be unmodified since the call to cusparseSpMV_preprocess. When cusparseSpMV is called with matA and its active buffer, it may read acceleration data from the buffer.

Calling cusparseSpMV_preprocess again with matA and a new buffer will make the new buffer active, forgetting about the previously-active buffer and making it inactive. For cusparseSpMV, there can only be one active buffer per sparse matrix at a time. To get the effect of multiple active buffers for a single sparse matrix, create multiple matrix handles that all point to the same index and value buffers, and call cusparseSpMV_preprocess once per handle with different workspace buffers.

Calling cusparseSpMV with an inactive buffer is always permitted. However, there may be no acceleration from the preprocessing in that case.

For the purposes of thread safety, cusparseSpMV_preprocess is writing to matA internal state.

Performance notes:

cusparseSpMVAlg_t::CUSPARSE_SPMV_COO_ALG1 and cusparseSpMVAlg_t::CUSPARSE_SPMV_CSR_ALG1 provide higher performance than cusparseSpMVAlg_t::CUSPARSE_SPMV_COO_ALG2 and cusparseSpMVAlg_t::CUSPARSE_SPMV_CSR_ALG2.
In general, opA == CUSPARSE_OPERATION_NON_TRANSPOSE is 3x faster than opA != CUSPARSE_OPERATION_NON_TRANSPOSE.
Using cusparseSpMV_preprocess helps improve performance of cusparseSpMV in CSR. It is beneficial when we need to run cusparseSpMV multiple times with a same matrix (cusparseSpMV_preprocess is executed only once).

cusparseSpMV has the following properties:

The routine requires extra storage for CSR/CSC format (all algorithms) and for COO format with cusparseSpMVAlg_t::CUSPARSE_SPMV_COO_ALG2 algorithm.
Provides deterministic (bit-wise) results for each run only for cusparseSpMVAlg_t::CUSPARSE_SPMV_COO_ALG2, cusparseSpMVAlg_t::CUSPARSE_SPMV_CSR_ALG2 and cusparseSpMVAlg_t::CUSPARSE_SPMV_BSR_ALG1 algorithms, and opA == CUSPARSE_OPERATION_NON_TRANSPOSE.
The routine supports asynchronous execution.
compute-sanitizer could report false race conditions for this routine when beta == 0. This is for optimization purposes and does not affect the correctness of the computation.
The routine allows the indices of matA to be unsorted.

cusparseSpMV supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

Please visit cuSPARSE Library Samples - cusparseSpMV CSR and cusparseSpMV COO for a code example.

§Parameters

handle: Handle to the cuSPARSE library context.
opA: Operation op(A).
alpha: $\alpha$ scalar used for multiplication of type computeType.
matA: Sparse matrix A.
vecX: Dense vector X.
beta: $\beta$ scalar used for multiplication of type computeType.
vecY: Dense vector Y.
computeType: Datatype in which the computation is executed.
alg: Algorithm for the computation.
externalBuffer: Pointer to a workspace buffer of at least bufferSize bytes.

cusparseSpMV

Function cusparseSpMV Copy item path

§Parameters

Function cusparseSpMV