Expand description
GPU-accelerated matrix operations exposed to Python.
This module provides a GPU-dispatch API with a pure-CPU fallback that is always available. The API surface is identical regardless of whether GPU hardware is present, so Python callers need no conditional logic.
§GPU path (future cuda_bridge feature)
When the cuda_bridge feature is enabled and cudarc/candle are linked,
the functions below dispatch to the GPU kernel instead of the CPU path.
The feature gate is wired up but the cudarc integration itself is deferred
until GPU hardware is available in CI. See TODO.md L149/L151.
§CPU path (default, pure Rust)
The CPU implementations use plain Vec<f64> arithmetic and are correct,
tested, and zero-dependency.
Functions§
- cuda_
tensor_ matmul - Multiply two PyTorch/JAX tensors via the DLPack protocol.
- gpu_
device_ info - Return a string describing the active compute device.
- gpu_
elementwise - Apply an element-wise activation to every element of
data. - gpu_
frobenius_ norm - Compute the Frobenius norm of a flat matrix.
- gpu_
matmul - Multiply two row-major matrices: C = A (m×k) × B (k×n).
- gpu_
matrix_ add - Add two row-major matrices element-wise.
- gpu_
matrix_ scale - Scale a row-major matrix by a scalar.
- register_
gpu_ module - Register all GPU-dispatch functions in the parent Python module.