Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
ndrs
ndrs is a NumPy‑like tensor library for Rust, providing multi‑dimensional array (tensor) operations with optional GPU acceleration via CUDA. It emphasizes zero‑copy views, efficient strided operations, and a flexible ownership model.
✨ Features
- N‑dimensional tensors – shape, strides, and byte‑level data storage.
- View‑based operations – slicing, broadcasting, transposing, and reshaping without copying data.
- Efficient strided copy – fast data movement between non‑contiguous layouts.
- Thread‑local and thread‑safe variants
Rc<RefCell<Tensor>>for single‑threaded speedArc<ReentrantMutex<RefCell<Tensor>>>for multi‑threading and Python bindings
- GPU acceleration – transparent CPU ↔ GPU transfer, CUDA kernels for strided element‑wise addition.
- Operator overloading –
+and+=for tensors with broadcastable shapes. - Python‑like slicing – intuitive
s!macro:s![1..4:2, ..]. - Broadcasting – automatic shape expansion for arithmetic.
- Python bindings – use ndrs from Python via PyO3 (optional).
🚀 Quick Start
Add this to your Cargo.toml:
[]
= "0.1"
Basic CPU usage
use ;
GPU usage (CUDA)
use ;
Note: GPU operations require the
cudafeature and a CUDA‑capable device.
🧠 Core Concepts
Tensor
The raw data container. It owns a contiguous byte buffer (either on CPU or GPU) and stores shape, strides, data type, and device information. It does not implement operations directly – use TensorView for that.
TensorView
A view into a Tensor with an optional offset, shape, and strides. All mathematical operations (addition, slicing, broadcasting, device transfer) are defined on views.
RcTensorView– thread‑local variant usingRc<RefCell<Tensor>>. Fast and lightweight for single‑threaded code.ArcTensorView– thread‑safe variant usingArc<ReentrantMutex<RefCell<Tensor>>>. Required for multi‑threaded environments and Python bindings.
Slice macro s!
Creates a slice descriptor for the .slice() method. Supports ranges, steps, single indices, and .. (all).
let sub = view.slice?; // rows 1..4, cols 2..6
let row = view.slice?; // single row (dimension reduced)
let col = view.slice?; // single column
let every_other = view.slice?; // every second row
Broadcasting
Use broadcast_shapes to compute the target shape for two tensors, then broadcast_to to expand a view.
use broadcast_shapes;
let a = new_cpu_from_f32;
let b = new_cpu_from_f32;
let target = broadcast_shapes.unwrap; // [3, 4]
let a_bcast = a_view.broadcast_to?;
let b_bcast = b_view.broadcast_to?;
Device management
Device::Cpu– host memory.Device::Cuda(id)– CUDA device with given index.set_current_device(id)– sets the default device for context creation.get_device_count()– returns number of CUDA‑capable devices.
📦 Cargo Features
- default – CPU only.
- cuda – enables GPU support (requires CUDA toolkit and
cudarc).
🧪 Testing
Run all tests (CPU only, GPU tests are ignored by default if no device):
To run GPU tests (requires CUDA device):
🐍 Python Bindings
The ndrs-python crate provides Python bindings using PyO3. Install from source:
Then in Python:
# Create a tensor from a nested list (auto‑detects dtype)
=
# Move to GPU and add
=
= +
# Convert back to NumPy (requires `numpy` installed)
# [[4. 6.]
# [8. 10.]]
The Python API mirrors the Rust API: slicing, broadcasting, and arithmetic operators are supported.
⚙️ Custom Data Types
You can register your own data types for tensor operations:
use ;
use Arc;
const DTYPE_MY_TYPE: DType = 1000;
register_dtype;
register_add_op;
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
🤝 Contributing
Contributions are welcome! Please open an issue or pull request on GitHub. For major changes, please discuss first.
🙏 Acknowledgments
- Inspired by NumPy, PyTorch, and the
ndarraycrate. - Uses cudarc for CUDA bindings and bytemuck for safe byte casts.