rumus
Core crate for the RUMUS native-Rust deep learning framework.
What's Inside
| Module | Description |
|---|---|
tensor |
StorageHandle (CPU Vec or GPU wgpu::Buffer via parking_lot::RwLock<StorageData>), Layout, AutogradState, DType (F32/F16/Q8), N-dimensional broadcasting, to_dtype() cast, quantize()/dequantize(), and all tensor operations (add, mul, matmul, relu, sigmoid, tanh, gelu, leaky_relu, dropout, im2col, flatten, max_pool2d, batch_norm_2d, adaptive_avg_pool2d, bmm, softmax, layer_norm, embedding_forward, cross_entropy_loss, broadcast_add/sub/mul, etc.) |
autograd |
Thread-local Tape, GradientStore, Kahn's algorithm backward engine, no_grad() RAII guard, VersionSnapshot with Weak references, 31 concrete BackwardOp variants (incl. Cast) |
backend |
Backend trait (CPU) + feature-gated gpu module: GpuContext singleton (supports_f16), BufferPool, PipelineCache (35+ F32 pipelines + 30 F16 + cast + Q8 quantize/dequantize/matmul pipelines), WGSL metaprogramming via alias scalar |
nn |
Parameter, Module trait, #[derive(Module)] (re-exported from rumus-macros), Linear, Conv2d, ConvTranspose2d, MaxPool2d, AdaptiveAvgPool2d, Flatten, Dropout, BatchNorm2d, LayerNorm, Embedding, MultiheadAttention, TransformerBlock, activations (relu, sigmoid, tanh, gelu, leaky_relu), mse_loss, cross_entropy_loss, safetensors IO |
optim |
Optimizer trait (step + set_lr/get_lr), SGD, Adam, AdamW — all with CPU + GPU dual-path dispatch. LRScheduler trait with StepLR and CosineAnnealingLR. clip_grad_norm_ with 3-pass non-stalling GPU strategy |
data |
Dataset trait, DataItem, DataLoader with multithreaded prefetching (std::thread + bounded mpsc), Fisher-Yates shuffle, deadlock-free Drop teardown |
onnx |
(feature-gated) Thread-local Tracer, TracedGraph, export_onnx() — graph tracing + Protobuf serialization to .onnx files |
train |
Trainer<O: Optimizer> — closure-based train_step() orchestrator |
Features
default— CPU-only build. No external GPU dependencies.gpu— Enables WGPU compute backend (wgpu+pollster). All tensor ops auto-dispatch to WGSL shaders when data is GPU-resident.onnx— Enables ONNX model export (prost+prost-build). Trace a forward pass and serialize to.onnxfor ONNX Runtime / TensorRT.
Quick Start
use ;
use ;
use Trainer;
use Tensor;
let model = MLPnew;
let mut trainer = new;
// One training step:
let loss = trainer.train_step.unwrap;
Dependencies
rumus-macros—#[derive(Module)]proc macrosafetensors— model persistencebytemuck— safe f32/u8 castsparking_lot— mappedRwLockguards forStorageDatawgpu+pollster(optional, behindgpufeature)
License
Licensed under either of
at your option.