Function cuda_tensor_matmul

Source

pub fn cuda_tensor_matmul<'py>(
    _py: Python<'py>,
    tensor_a: &Bound<'py, PyAny>,
    _tensor_b: &Bound<'py, PyAny>,
) -> PyResult<Py<PyAny>>

Expand description

Multiply two PyTorch/JAX tensors via the DLPack protocol.

This is the entry point for the zero-copy GPU path. When the cuda_bridge Cargo feature is enabled (and cudarc is linked), the function accepts any Python object implementing __dlpack__ and dispatches directly to a CUDA GEMM kernel.

In the current CPU-only build this function returns PyNotImplementedError with a clear message directing callers to gpu_matmul().

§Python example

import torch, scirs2
a = torch.randn(512, 512, device='cuda')
b = torch.randn(512, 512, device='cuda')
# GPU path (when cuda_bridge feature is enabled):
c = scirs2.cuda_tensor_matmul(a, b)
# CPU fallback for all tensor sizes:
c_data = scirs2.gpu_matmul(a.flatten().tolist(), 512, 512, b.flatten().tolist(), 512)

cuda_tensor_matmul

Function cuda_tensor_matmul Copy item path

§Python example

Function cuda_tensor_matmul