numr
THE foundational numerical computing library for Rust.
numr provides dense tensors, linear algebra, FFT, statistics, advanced random number generation, and automatic differentiation—with the same API and algorithms across CPU, CUDA, and WebGPU backends.
Why numr?
The Rust numerical computing ecosystem is fragmented. You need one library for tensors (ndarray), another for linear algebra (nalgebra/faer), another for FFT (rustfft), another for random numbers, another for statistics. They don't interoperate. They don't have GPU support. They're not optimized together.
numr consolidates everything:
| Task | Old Ecosystem | numr |
|---|---|---|
| Tensors | ndarray | Tensor |
| Linear algebra | nalgebra / faer | numr::linalg |
| FFT | rustfft | numr::fft |
| Sparse | sprs / ndsparse | numr::sparse (feature-gated) |
| Statistics | statrs | numr::statistics |
| Random numbers | rand + manual distributions | numr::random + multivariate |
| GPU support | None | CPU, CUDA, WebGPU |
| Automatic differentiation | None | numr::autograd |
A Rust developer should never need to look elsewhere for numerical computing.
Architecture
numr is designed with a simple principle: same code, any backend.
┌──────────────────────────────────────────────────────────────┐
│ Your Application │
│ (any backend-agnostic code) │
└──────────────────────────────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌───▼────┐
│ CPU │ │ CUDA │ │ WebGPU │
│ Runtime │ │ Runtime │ │Runtime │
└────┬────┘ └────┬────┘ └───┬────┘
│ │ │
┌────▼──────────┬─────────┴───────┬───────────▼───┐
│ Trait │ │ │
│ Implemen- │ Same Algorithm │ Different │
│ tations │ Different Code │ Hardware │
└───────────────┴─────────────────┴───────────────┘
Operations
numr implements a comprehensive set of tensor operations across CPU, CUDA, and WebGPU:
Core Arithmetic
- UnaryOps: neg, abs, sqrt, exp, log, sin, cos, tan, sinh, cosh, tanh, floor, ceil, round, and more
- BinaryOps: add, sub, mul, div, pow, maximum, minimum (all with NumPy-style broadcasting)
- ScalarOps: tensor-scalar arithmetic
Shape and Data Movement
- ShapeOps: cat, stack, split, chunk, repeat, pad, roll
- IndexingOps: gather, scatter, index_select, masked_select, masked_fill, embedding_lookup
- SortingOps: sort, argsort, topk, unique, nonzero, searchsorted
Reductions
- ReduceOps: sum, mean, max, min, prod (with precision variants)
- CumulativeOps: cumsum, cumprod, logsumexp
Comparisons and Logical
- CompareOps: eq, ne, lt, le, gt, ge
- LogicalOps: logical_and, logical_or, logical_xor, logical_not
- ConditionalOps: where (ternary conditional)
Neural Network Operations
- ActivationOps: relu, sigmoid, silu, gelu, leaky_relu, elu, softmax
- NormalizationOps: rms_norm, layer_norm
Linear Algebra
- MatmulOps: matmul, matmul_bias (fused GEMM+bias)
- LinalgOps: solve, lstsq, pinverse, inverse, det, trace, matrix_rank, diag, matrix_norm, kron, khatri_rao
Statistics and Probability
- StatisticalOps: var, std, skew, kurtosis, quantile, percentile, median, cov, corrcoef
- RandomOps: rand, randn, randint, multinomial, bernoulli, poisson, binomial, beta, gamma, exponential, chi_squared, student_t, f_distribution
- MultivariateRandomOps: multivariate_normal, wishart, dirichlet
- QuasirandomOps: Sobol, Halton sequences
Distance Metrics
- DistanceOps: euclidean, manhattan, cosine, hamming, jaccard, minkowski, chebyshev, correlation
Algorithm Modules
Linear Algebra (numr::linalg):
- Decompositions: LU, QR, Cholesky, SVD, Schur, full eigendecomposition, generalized eigenvalues
- Solvers: solve, lstsq, pinverse
- Matrix functions: exp, log, sqrt, sign
- Utilities: det, trace, rank, matrix norms
Fast Fourier Transform (numr::fft):
- FFT/IFFT (1D, 2D, ND) - Stockham algorithm
- Real FFT (RFFT/IRFFT)
Matrix Multiplication (numr::matmul):
- Tiled GEMM with register blocking
- Bias fusion support
Special Functions (numr::special):
- Gamma functions: gamma, lgamma, digamma, polygamma
- Error functions: erf, erfc, erfcinv
- Bessel functions: J0, J1, Jn, Y0, Y1, Yn
- Inverse special functions: erfcinv
Sparse Tensors (numr::sparse, feature-gated):
- Formats: CSR, CSC, COO
- Operations: SpGEMM (sparse matrix multiplication), SpMV (sparse matrix-vector), DSMM (dense-sparse matrix)
Dtypes
numr supports a wide range of numeric types:
| Type | Size | CPU | CUDA | WebGPU | Feature |
|---|---|---|---|---|---|
| f64 | 8B | ✓ | ✓ | ✗ | - |
| f32 | 4B | ✓ | ✓ | ✓ | - |
| f16 | 2B | ✓ | ✓ | ✓ | f16 |
| bf16 | 2B | ✓ | ✓ | ✗ | f16 |
| fp8e4m3 | 1B | ✓ | ✓ | ✗ | fp8 |
| fp8e5m2 | 1B | ✓ | ✓ | ✗ | fp8 |
| i64 | 8B | ✓ | ✓ | ✗ | - |
| i32 | 4B | ✓ | ✓ | ✓ | - |
| i16 | 2B | ✓ | ✓ | ✗ | - |
| i8 | 1B | ✓ | ✓ | ✗ | - |
| u64 | 8B | ✓ | ✓ | ✗ | - |
| u32 | 4B | ✓ | ✓ | ✓ | - |
| u16 | 2B | ✓ | ✓ | ✗ | - |
| u8 | 1B | ✓ | ✓ | ✓ | - |
| bool | 1B | ✓ | ✓ | ✓ | - |
Every operation supports every compatible dtype. No hardcoded f32-only kernels.
Backends
All backends implement identical algorithms with native kernels—no cuBLAS, MKL, or vendor library dependencies.
| Hardware | Backend | Feature | Status | Notes |
|---|---|---|---|---|
| CPU (x86-64) | CPU | cpu (default) | ✓ | AVX-512/AVX2 SIMD |
| CPU (ARM) | CPU | cpu | Planned | NEON SIMD |
| NVIDIA GPU | CUDA | cuda | ✓ | Native PTX kernels |
| AMD GPU | WebGPU | wgpu | ✓ | WGSL shaders |
| Intel GPU | WebGPU | wgpu | ✓ | WGSL shaders |
| Apple GPU | WebGPU | wgpu | ✓ | WGSL shaders |
| AMD GPU | ROCm | - | Planned | Native HIP kernels |
Why Native Kernels?
- Fewer dependencies: No 2GB+ CUDA toolkit, no MKL installation
- Portability: Same code on CPU, NVIDIA, AMD, Intel, Apple
- Transparency: Understand exactly what code runs on your hardware
- Maintainability: Your code doesn't break when vendor updates drop
- Performance: Kernels optimize for YOUR workloads, not generic cases
Quick Start
CPU Example
use *;
use CpuRuntime;
GPU Example (CUDA)
use *;
use CudaRuntime;
Backend-Generic Code
use *;
use Runtime;
use Tensor;
// Works on CPU, CUDA, or WebGPU
// Use the same function on different hardware
Linear Algebra
use *;
use ;
FFT
use *;
use FftOps;
Statistics and Distributions
use *;
Installation
CPU-only (default)
[]
= "*"
With GPU Support
[]
# NVIDIA CUDA (requires CUDA 12.0+)
= { = "*", = ["cuda"] }
# Cross-platform GPU (NVIDIA, AMD, Intel, Apple)
= { = "*", = ["wgpu"] }
With Optional Features
[]
= { = "*", = [
"cuda", # NVIDIA GPU support
"wgpu", # Cross-platform GPU (WebGPU)
"f16", # Half-precision (F16, BF16)
"fp8", # 8-bit floating point
"sparse", # Sparse tensors
] }
Feature Flags
| Feature | Description | Default |
|---|---|---|
cpu |
CPU backend (AVX-512/AVX2 on x86-64, NEON planned) | ✓ |
cuda |
NVIDIA CUDA backend | ✗ |
wgpu |
Cross-platform GPU (WebGPU) | ✗ |
rayon |
Multi-threaded CPU via Rayon | ✓ |
f16 |
Half-precision floats (F16, BF16) | ✗ |
fp8 |
8-bit floats (FP8E4M3, FP8E5M2) | ✗ |
sparse |
Sparse tensor support (CSR, CSC, COO) | ✗ |
Building from Source
# CPU only
# With CUDA
# With WebGPU
# With all features
# Run tests
# Run benchmarks
How numr Fits in the Stack
numr is the foundation that everything else builds on:
┌────────────────────────────────────┐
│ Applications (oxidizr, blazr) │
│ Your domain-specific code │
└────────────────┬───────────────────┘
│
┌────────────────▼───────────────────┐
│ boostr - ML Framework │
│ (neural networks, attention) │
│ Builds on numr ops │
└────────────────┬───────────────────┘
│
┌────────────────▼───────────────────┐
│ solvr - Scientific Computing │
│ (optimization, ODE, interpolation)│
│ Builds on numr ops and linalg │
└────────────────┬───────────────────┘
│
┌────────────────▼───────────────────┐
│ numr - Foundations │
│ (tensors, linalg, FFT, random) │
│ Native CPU, CUDA, WebGPU kernels │
└────────────────────────────────────┘
When numr's kernels improve, everything above improves automatically.
Kernels and Extensibility
numr provides default kernels for all operations. You can also:
- Use default kernels: All operations work out of the box with optimized SIMD (CPU), PTX (CUDA), and WGSL (WebGPU) kernels
- Replace specific kernels: Swap in your own optimized kernels for performance-critical paths
- Add new operations: Define new traits and implement kernels for all backends
For detailed guidance on writing custom kernels, adding new operations, and backend-specific optimization techniques, see docs/extending-numr.md.
License
Apache-2.0