pyo3-dlpack 0.1.0

Zero-copy DLPack tensor interop for PyO3
Documentation
  • Coverage
  • 100%
    73 out of 73 items documented0 out of 0 items with examples
  • Size
  • Source code size: 333.36 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 1.18 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 15s Average build duration of successful builds.
  • all releases: 15s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • isPANN/pyo3-dlpack
    2 0 2
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • isPANN

pyo3-dlpack

Zero-copy DLPack tensor interop for PyO3.

This crate provides a safe and ergonomic way to exchange tensor data between Rust and Python ML frameworks (PyTorch, JAX, TensorFlow, CuPy, etc.) using the DLPack protocol.

Features

  • Zero-copy: Tensors are shared directly without copying data
  • PyO3 0.27+: Uses the modern API (no deprecation warnings)
  • Bidirectional: Import tensors from Python and export tensors to Python
  • Device-agnostic: Works with CPU, CUDA, ROCm, and other devices

Installation

Add to your Cargo.toml:

[dependencies]
pyo3-dlpack = "0.1"
pyo3 = "0.27"

Usage

Importing a tensor from Python

use pyo3::prelude::*;
use pyo3_dlpack::PyTensor;

#[pyfunction]
fn process_tensor(py: Python<'_>, obj: &Bound<'_, PyAny>) -> PyResult<()> {
    let tensor = PyTensor::from_pyany(py, obj)?;

    println!("Shape: {:?}", tensor.shape());
    println!("Device: {:?}", tensor.device());
    println!("Dtype: {:?}", tensor.dtype());

    if tensor.device().is_cpu() {
        // Safe to access data on CPU
        let ptr = tensor.data_ptr() as *const f32;
        // ... process the data
    }

    Ok(())
}

Exporting a tensor to Python

use pyo3::prelude::*;
use pyo3_dlpack::{IntoDLPack, TensorInfo, cuda_device, dtype_f32};
use std::ffi::c_void;

struct MyGpuTensor {
    device_ptr: u64,
    shape: Vec<i64>,
    device_id: i32,
}

impl IntoDLPack for MyGpuTensor {
    fn tensor_info(&self) -> TensorInfo {
        TensorInfo::contiguous(
            self.device_ptr as *mut c_void,
            cuda_device(self.device_id),
            dtype_f32(),
            self.shape.clone(),
        )
    }
}

#[pyfunction]
fn create_tensor(py: Python<'_>) -> PyResult<Py<PyAny>> {
    let tensor = MyGpuTensor {
        device_ptr: 0x12345678, // your actual device pointer
        shape: vec![2, 3],
        device_id: 0,
    };
    tensor.into_dlpack(py)
}

Python side:

import torch

# Call your Rust function that returns a DLPack capsule
capsule = create_tensor()

# Convert to PyTorch tensor (zero-copy)
tensor = torch.from_dlpack(capsule)

Supported Data Types

  • Float: f16, f32, f64, bf16
  • Integer: i8, i16, i32, i64
  • Unsigned: u8, u16, u32, u64
  • Boolean

Supported Devices

  • CPU
  • CUDA
  • CUDA Host (pinned memory)
  • ROCm
  • Metal
  • Vulkan
  • And more (see DLDeviceType)

Performance

DLPack enables true zero-copy tensor sharing. Benchmark results on Apple M3:

Operation Time vs Copy
DLPack capsule export (1M f32) 8.3 µs 7.3x faster
DLPack capsule import (1M f32) 7.9 µs 7.7x faster
Vec clone baseline (1M f32) 60.9 µs -

The DLPack overhead is constant regardless of tensor size - only metadata is processed, not the actual data. This makes it ideal for large tensors where copying would be expensive.

# Rust criterion benchmarks (cargo bench)
export_capsule_1k       time:   [155.44 ns 159.74 ns 166.84 ns]
export_capsule_1m       time:   [7.71 µs 8.26 µs 8.89 µs]
import_capsule_1m       time:   [7.44 µs 7.89 µs 8.41 µs]
vec_clone_1m            time:   [60.45 µs 60.90 µs 61.38 µs]

Run benchmarks yourself:

  • make bench-rust - Rust criterion benchmarks
  • make bench-python - Python benchmarks
  • make bench - All benchmarks

Testing

Validate correctness and zero-copy behavior:

  • make test - Unit + integration tests (105 tests)
  • Tests verify data pointers are preserved across transfers

Python environment

The test module is built with maturin using the same interpreter as tests. Override it with PYTHON=/path/to/python if needed (e.g., a venv). Default tests include PyTorch (pip install -e ".[test]"). For CI or lightweight runs, use pip install -e ".[test-lite]".

License

Licensed under the MIT license. See LICENSE for details.