pub unsafe extern "C" fn cusolverDnIRSParamsSetSolverPrecisions(
params: cusolverDnIRSParams_t,
solver_main_precision: cusolverPrecType_t,
solver_lowest_precision: cusolverPrecType_t,
) -> cusolverStatus_tExpand description
This function sets both the main and the lowest precision for the Iterative Refinement Solver (IRS). By main precision, we mean the precision of the Input and Output datatype. By lowest precision, we mean the solver is allowed to use as lowest computational precision during the LU factorization process. Note that the user has to set both the main and lowest precision before the first call to the IRS solver because they are NOT set by default with the params structure creation, as it depends on the Input Output data type and user request. It is a wrapper to both cusolverDnIRSParamsSetSolverMainPrecision and cusolverDnIRSParamsSetSolverLowestPrecision. All possible combinations of main/lowest precision are described in the table below. Usually the lowest precision defines the speedup that can be achieved. The ratio of the performance of the lowest precision over the main precision (e.g., Inputs/Outputs datatype) define the upper bound of the speedup that could be obtained. More precisely, it depends on many factors, but for large matrices sizes, it is the ratio of the matrix-matrix rank-k product (e.g., GEMM where K is 256 and M=N=size of the matrix) that define the possible speedup. For instance, if the inout precision is real double precision CUSOLVER_R_64F and the lowest precision is CUSOLVER_R_32F, then we can expect a speedup of at most 2X for large problem sizes. If the lowest precision was CUSOLVER_R_16F, then we can expect 3X-4X. A reasonable strategy should take the number of right-hand sides, the size of the matrix as well as the convergence rate into account.
Supported Inputs/Outputs data type and lower precision for the IRS solver
| Inputs/Outputs Data Type (e.g., main precision) | Supported values for the lowest precision |
|---|---|
cusolverPrecType_t::CUSOLVER_C_64F | CUSOLVER_C_64F, CUSOLVER_C_32F, CUSOLVER_C_16F, CUSOLVER_C_16BF, CUSOLVER_C_TF32 |
cusolverPrecType_t::CUSOLVER_C_32F | CUSOLVER_C_32F, CUSOLVER_C_16F, CUSOLVER_C_16BF, CUSOLVER_C_TF32 |
cusolverPrecType_t::CUSOLVER_R_64F | CUSOLVER_R_64F, CUSOLVER_R_32F, CUSOLVER_R_16F, CUSOLVER_R_16BF, CUSOLVER_R_TF32 |
cusolverPrecType_t::CUSOLVER_R_32F | CUSOLVER_R_32F, CUSOLVER_R_16F, CUSOLVER_R_16BF, CUSOLVER_R_TF32 |
§Parameters
params: ThecusolverDnIRSParams_t Paramsstructure.solver_main_precision: Allowed Inputs/Outputs datatype (for example CUSOLVER_R_FP64 for a real double precision data). See the table below for the supported precisions.solver_lowest_precision: Allowed lowest compute type (for example CUSOLVER_R_16F for half precision computation). See the table below for the supported precisions.
§Return value
cusolverStatus_t::CUSOLVER_STATUS_IRS_PARAMS_NOT_INITIALIZED: TheParamsstructure was not created.cusolverStatus_t::CUSOLVER_STATUS_SUCCESS: The operation completed successfully.