Skip to main content

Module dlpack_cuda

Module dlpack_cuda 

Source
Expand description

GPU tensor passthrough via DLPack without CPU roundtrip.

Provides device-aware dispatch: CPU tensors are zero-copy viewed as ndarray, while CUDA/ROCm/Metal tensors are returned as dlpack_cuda::CudaTensorInfo without touching device memory. CUDA / GPU tensor passthrough via DLPack.

When a DLPack capsule contains a tensor resident on CUDA (or another non-CPU device), naively trying to consume it as a CPU ndarray would either panic or silently trigger an unacceptable host-device copy.

This module provides:

  • CudaTensorInfo — metadata extracted from a CUDA DLPack tensor without triggering any data copy.
  • cuda_tensor_info_from_dltensor — pure-Rust function operating directly on a DLTensor; no Python runtime needed.
  • [dlpack_auto_dispatch_f32] / [dlpack_auto_dispatch_f64] — device- aware dispatch that returns an ndarray view for CPU tensors, or CudaTensorInfo for GPU tensors, with no host-device copy.

§Design

The DLPack standard defines device type codes:

CodeDevice
1CPU
2CUDA
3CUDA pinned host
4OpenCL
7Vulkan
8Metal
10ROCm

For CPU tensors (type 1) the existing zero-copy array_from_dlpack_f32/f64 functions are used directly. For CUDA tensors (type 2) we extract shape, dtype, device_id, and byte_offset without touching the data pointer.

§CUDA runtime linkage

Full GPU-to-GPU processing (e.g. copying the tensor buffer to a cudarc-managed allocation) requires the cuda_special cargo feature and CUDA runtime linkage, which is deliberately kept out of default features to preserve the Pure Rust build. With default features only the metadata extraction path is available.

Structs§

CudaTensorInfo
Metadata extracted from a CUDA-resident DLPack tensor.

Enums§

DLPackDispatchResult
The result of auto-dispatching a DLPack tensor based on its device.

Functions§

cuda_tensor_info
Extract CudaTensorInfo from a Python DLPack capsule object.
cuda_tensor_info_from_dltensor
Extract CudaTensorInfo from a raw DLTensor pointer.
dlpack_auto_dispatch_f32
Dispatch an f32 DLPack tensor to CPU or GPU path without a CPU roundtrip.
dlpack_auto_dispatch_f64
Dispatch an f64 DLPack tensor to CPU or GPU path without a CPU roundtrip.
get_cuda_tensor_info
Python-facing function: extract GPU tensor metadata from a DLPack capsule.
register_dlpack_cuda_module
Register the get_cuda_tensor_info function into a PyO3 module.