Multi-Target High-Performance Compute Library
trueno (Spanish: "thunder") provides unified compute primitives across CPU SIMD, GPU, and WebAssembly.
Features
- CPU SIMD: x86 (SSE2/AVX/AVX2/AVX-512), ARM (NEON), WASM (SIMD128)
- GPU: Vulkan/Metal/DX12/WebGPU via
wgpu(matrix multiply only) - Auto-dispatch: Runtime selection of optimal backend
- Zero unsafe in public API: Safety via type system
Installation
[]
= "0.8"
# Optional: GPU support for large matrices
= { = "0.8", = ["gpu"] }
Quick Start
use ;
// Vector operations - auto-selects best SIMD backend
let a = from_slice;
let b = from_slice;
let sum = a.add.unwrap; // [6.0, 8.0, 10.0, 12.0]
let dot = a.dot.unwrap; // 70.0
let activated = a.relu.unwrap; // ReLU activation
// Matrix operations
let m = from_vec.unwrap;
let product = m.matmul.unwrap; // Matrix multiplication
let transposed = m.transpose; // Transpose
// Batched matmul for transformers (Q @ K^T pattern)
let batch = 2; let heads = 4; let seq = 8; let dim = 64;
let q: = vec!;
let kt: = vec!;
let attn = batched_matmul_4d.unwrap;
// Eigendecomposition (PCA, spectral analysis)
let cov = from_vec.unwrap;
let eigen = new.unwrap;
let eigenvalues = eigen.eigenvalues; // [4.0, 2.0]
Performance
| Operation | SIMD Speedup | Notes |
|---|---|---|
| Dot product | 6-17x | AVX-512 for compute-bound |
| Matrix multiply | 2-10x | GPU for 500x500+ |
| Reductions (sum, max, min) | 3-12x | AVX-512 optimal |
| Element-wise (add, mul) | 1-2x | Memory-bound |
GPU Note: GPU acceleration benefits matrix multiply only. Element-wise operations use CPU SIMD (GPU transfer overhead exceeds compute time).
Operations
Vector: add, sub, mul, div, dot, sum, min, max, argmin, argmax, norm_l1, norm_l2, normalize
Activations: relu, leaky_relu, elu, sigmoid, tanh, gelu, swish, softmax, log_softmax
Matrix: matmul, batched_matmul, batched_matmul_4d, transpose, matvec, convolve2d
Statistics: mean, variance, stddev, covariance, correlation, zscore
Eigen: symmetric eigendecomposition (Jacobi algorithm)
Development
Ecosystem
Part of the Pragmatic AI Labs stack:
- trueno-gpu - Pure Rust PTX generation
- trueno-db - GPU-first analytics database
- trueno-graph - Graph algorithms
- trueno-rag - RAG pipeline
License
MIT - see LICENSE