Skip to main content

Module gpu_ops

Module gpu_ops 

Source
Expand description

GPU-accelerated matrix operations exposed to Python.

This module provides a GPU-dispatch API with a pure-CPU fallback that is always available. The API surface is identical regardless of whether GPU hardware is present, so Python callers need no conditional logic.

§GPU path (future cuda_bridge feature)

When the cuda_bridge feature is enabled and cudarc/candle are linked, the functions below dispatch to the GPU kernel instead of the CPU path. The feature gate is wired up but the cudarc integration itself is deferred until GPU hardware is available in CI. See TODO.md L149/L151.

§CPU path (default, pure Rust)

The CPU implementations use plain Vec<f64> arithmetic and are correct, tested, and zero-dependency.

Functions§

cuda_tensor_matmul
Multiply two PyTorch/JAX tensors via the DLPack protocol.
gpu_device_info
Return a string describing the active compute device.
gpu_elementwise
Apply an element-wise activation to every element of data.
gpu_frobenius_norm
Compute the Frobenius norm of a flat matrix.
gpu_matmul
Multiply two row-major matrices: C = A (m×k) × B (k×n).
gpu_matrix_add
Add two row-major matrices element-wise.
gpu_matrix_scale
Scale a row-major matrix by a scalar.
register_gpu_module
Register all GPU-dispatch functions in the parent Python module.