Expand description
GPU tensor passthrough via DLPack without CPU roundtrip.
Provides device-aware dispatch: CPU tensors are zero-copy viewed as
ndarray, while CUDA/ROCm/Metal tensors are returned as dlpack_cuda::CudaTensorInfo
without touching device memory.
CUDA / GPU tensor passthrough via DLPack.
When a DLPack capsule contains a tensor resident on CUDA (or another
non-CPU device), naively trying to consume it as a CPU ndarray would
either panic or silently trigger an unacceptable host-device copy.
This module provides:
CudaTensorInfo— metadata extracted from a CUDA DLPack tensor without triggering any data copy.cuda_tensor_info_from_dltensor— pure-Rust function operating directly on aDLTensor; no Python runtime needed.- [
dlpack_auto_dispatch_f32] / [dlpack_auto_dispatch_f64] — device- aware dispatch that returns anndarrayview for CPU tensors, orCudaTensorInfofor GPU tensors, with no host-device copy.
§Design
The DLPack standard defines device type codes:
| Code | Device |
|---|---|
| 1 | CPU |
| 2 | CUDA |
| 3 | CUDA pinned host |
| 4 | OpenCL |
| 7 | Vulkan |
| 8 | Metal |
| 10 | ROCm |
For CPU tensors (type 1) the existing zero-copy array_from_dlpack_f32/f64
functions are used directly. For CUDA tensors (type 2) we extract shape,
dtype, device_id, and byte_offset without touching the data pointer.
§CUDA runtime linkage
Full GPU-to-GPU processing (e.g. copying the tensor buffer to a
cudarc-managed allocation) requires the cuda_special cargo feature
and CUDA runtime linkage, which is deliberately kept out of default
features to preserve the Pure Rust build. With default features only
the metadata extraction path is available.
Structs§
- Cuda
Tensor Info - Metadata extracted from a CUDA-resident DLPack tensor.
Enums§
- DLPack
Dispatch Result - The result of auto-dispatching a DLPack tensor based on its device.
Functions§
- cuda_
tensor_ info - Extract
CudaTensorInfofrom a Python DLPack capsule object. - cuda_
tensor_ info_ from_ dltensor - Extract
CudaTensorInfofrom a rawDLTensorpointer. - dlpack_
auto_ ⚠dispatch_ f32 - Dispatch an
f32DLPack tensor to CPU or GPU path without a CPU roundtrip. - dlpack_
auto_ ⚠dispatch_ f64 - Dispatch an
f64DLPack tensor to CPU or GPU path without a CPU roundtrip. - get_
cuda_ tensor_ info - Python-facing function: extract GPU tensor metadata from a DLPack capsule.
- register_
dlpack_ cuda_ module - Register the
get_cuda_tensor_infofunction into a PyO3 module.