Skip to main content

Module blas

Module blas 

Source
Expand description

Device BLAS surface for the cudarc-backed dense kernels.

The public surface here is the lowest level of the GPU dispatch stack: it takes ndarray views, copies them to a device buffer, calls a cuBLAS / kernel routine, and returns the host result. The cudarc-backed implementations always compile (cudarc dynamically loads libcuda at runtime via the fallback-dynamic-loading feature), and dispatch is gated at runtime on super::device_runtime::GpuRuntime::global() — when no device is probed the status enum advertises CudaUnavailable and callers fall back to CPU.

The implementations route through super::device_runtime::cuda_context_for and the cudarc 0.19 cuBLAS API. Any transient backend failure (OOM, launch error, …) is converted to None so the auto-dispatch shim in super::linalg falls back to the CPU fast path without disturbing numerics.

Functions§

blas_backend_status
xt_diag_x_cuda
xt_diag_y_cuda