Module gpu_ops

Expand description

GPU-accelerated matrix operations exposed to Python.

This module provides a GPU-dispatch API with a pure-CPU fallback that is always available. The API surface is identical regardless of whether GPU hardware is present, so Python callers need no conditional logic.

§GPU path (future `cuda_bridge` feature)

When the cuda_bridge feature is enabled and cudarc/candle are linked, the functions below dispatch to the GPU kernel instead of the CPU path. The feature gate is wired up but the cudarc integration itself is deferred until GPU hardware is available in CI. See TODO.md L149/L151.

§CPU path (default, pure Rust)

The CPU implementations use plain Vec<f64> arithmetic and are correct, tested, and zero-dependency.

Functions§

cuda_tensor_matmul: Multiply two PyTorch/JAX tensors via the DLPack protocol.
gpu_device_info: Return a string describing the active compute device.
gpu_elementwise: Apply an element-wise activation to every element of data.
gpu_frobenius_norm: Compute the Frobenius norm of a flat matrix.
gpu_matmul: Multiply two row-major matrices: C = A (m×k) × B (k×n).
gpu_matrix_add: Add two row-major matrices element-wise.
gpu_matrix_scale: Scale a row-major matrix by a scalar.
register_gpu_module: Register all GPU-dispatch functions in the parent Python module.

Module gpu_ops

Module gpu_ops Copy item path

§GPU path (future cuda_bridge feature)

§CPU path (default, pure Rust)

Functions§

Module gpu_ops

§GPU path (future `cuda_bridge` feature)