Skip to main content

cusparseSbsrsv2_solve

Function cusparseSbsrsv2_solve 

Source
pub unsafe extern "C" fn cusparseSbsrsv2_solve(
    handle: cusparseHandle_t,
    dirA: cusparseDirection_t,
    transA: cusparseOperation_t,
    mb: c_int,
    nnzb: c_int,
    alpha: *const f32,
    descrA: cusparseMatDescr_t,
    bsrSortedValA: *const f32,
    bsrSortedRowPtrA: *const c_int,
    bsrSortedColIndA: *const c_int,
    blockDim: c_int,
    info: bsrsv2Info_t,
    f: *const f32,
    x: *mut f32,
    policy: cusparseSolvePolicy_t,
    pBuffer: *mut c_void,
) -> cusparseStatus_t
Expand description

This function performs the solve phase of bsrsv2, a new sparse triangular linear system op(A)*y =$\alpha$x.

A is an (mb*blockDim)x(mb*blockDim) sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA); x and y are the right-hand-side and the solution vectors; $\alpha$ is a scalar; and: $$ \operatorname{op}(A) = \begin{cases} A & \text{if } trans = \text{CUSPARSE_OPERATION_NON_TRANSPOSE} \ A^T & \text{if } trans = \text{CUSPARSE_OPERATION_TRANSPOSE} \ A^H & \text{if } trans = \text{CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \end{cases} $$

The block in BSR format is of size blockDim*blockDim, stored as column-major or row-major as determined by parameter dirA, which is either cusparseDirection_t::CUSPARSE_DIRECTION_COLUMN or cusparseDirection_t::CUSPARSE_DIRECTION_ROW. The matrix type must be cusparseMatrixType_t::CUSPARSE_MATRIX_TYPE_GENERAL, and the fill mode and diagonal type are ignored. Function bsrsv02_solve() can support an arbitrary blockDim.

This function may be executed multiple times for a given matrix and a particular operation type.

This function requires a buffer size returned by bsrsv2_bufferSize(). The address of pBuffer must be multiple of 128 bytes. If it is not, cusparseStatus_t::CUSPARSE_STATUS_INVALID_VALUE is returned.

Although bsrsv2_solve() can be done without level information, the user still needs to be aware of consistency. If bsrsv2_analysis() is called with policy cusparseSolvePolicy_t::CUSPARSE_SOLVE_POLICY_USE_LEVEL, bsrsv2_solve() can be run with or without levels. On the other hand, if bsrsv2_analysis() is called with cusparseSolvePolicy_t::CUSPARSE_SOLVE_POLICY_NO_LEVEL, bsrsv2_solve() can only accept cusparseSolvePolicy_t::CUSPARSE_SOLVE_POLICY_NO_LEVEL; otherwise, cusparseStatus_t::CUSPARSE_STATUS_INVALID_VALUE is returned.

The level information may not improve the performance, but may spend extra time doing analysis. For example, a tridiagonal matrix has no parallelism. In this case, cusparseSolvePolicy_t::CUSPARSE_SOLVE_POLICY_NO_LEVEL performs better than cusparseSolvePolicy_t::CUSPARSE_SOLVE_POLICY_USE_LEVEL. If the user has an iterative solver, the best approach is to do bsrsv2_analysis() with cusparseSolvePolicy_t::CUSPARSE_SOLVE_POLICY_USE_LEVEL once. Then do bsrsv2_solve() with cusparseSolvePolicy_t::CUSPARSE_SOLVE_POLICY_NO_LEVEL in the first run, and with cusparseSolvePolicy_t::CUSPARSE_SOLVE_POLICY_USE_LEVEL in the second run, and pick the fastest one to perform the remaining iterations.

Function bsrsv02_solve() has the same behavior as csrsv02_solve(). That is, bsr2csr(bsrsv02(A)) = csrsv02(bsr2csr(A)). The numerical zero of csrsv02_solve() means there exists some zero A(j,j). The numerical zero of bsrsv02_solve() means there exists some block A(j,j) that is not invertible.

Function bsrsv2_solve() reports the first numerical zero, including a structural zero. No numerical zero is reported if cusparseDiagType_t::CUSPARSE_DIAG_TYPE_UNIT is specified, even if A(j,j) is not invertible for some j. The user needs to call cusparseXbsrsv2_zeroPivot to know where the numerical zero is.

The function supports the following properties if pBuffer != NULL:

  • The routine requires no extra storage.
  • The routine supports asynchronous execution.
  • The routine supports CUDA graph capture.

For example, suppose L is a lower triangular matrix with unit diagonal, then the following code solves L*y=x by level information.