Volta ⚡
Volta is a minimal deep learning and automatic differentiation library built from scratch in pure Rust, heavily inspired by PyTorch. It provides a dynamic computation graph, NumPy-style broadcasting, and common neural network primitives.
This project is an educational endeavor to demystify the inner workings of modern autograd engines. It prioritizes correctness, clarity, and a clean API over raw performance, while still providing hooks for hardware acceleration.
Key Features
- Dynamic Computation Graph: Build and backpropagate through graphs on the fly, just like PyTorch.
- Reverse-Mode Autodiff: A powerful
backward()method for efficient end-to-end gradient calculation. - Rich Tensor Operations: A comprehensive set of unary, binary, reduction, and matrix operations via an ergonomic
TensorOpstrait. - Broadcasting: Full NumPy-style broadcasting support for arithmetic operations.
- Neural Network Layers:
Linear,Conv2d,MaxPool2d,Flatten,ReLU,Sigmoid,Tanh. - Optimizers:
SGD(w/ Momentum),Adam(w/ bias correction), andMuon(Momentum Orthogonal). - IO System: Save and load model weights (state dicts) via
bincode. - BLAS Acceleration (macOS): Optional performance boost for matrix multiplication via Apple's Accelerate framework.
- Validation-Focused: Includes a robust numerical gradient checker to ensure the correctness of all implemented operations.
Project Status
This library is functional for training MLPs and CNNs on CPU. It features a verified autograd engine and correctly implemented im2col convolutions.
- ✅ What's Working: Autograd, Conv2d/Linear layers, Optimizers (including Muon), DataLoaders, Serialization.
- ⚠️ What's in Progress: Performance is not yet a primary focus. While BLAS acceleration is available for macOS matrix multiplication, most operations use naive loops.
- ❌ What's Missing:
- GPU Support: Currently CPU-only.
Installation
Add Volta to your Cargo.toml:
[]
= "0.1.0"
Enabling BLAS on macOS
For a significant performance boost in matrix multiplication on macOS, enable the accelerate feature:
[]
= { = "0.1.0", = ["accelerate"] }
Examples:
Training an MLP
Here's how to define a simple Multi-Layer Perceptron (MLP), train it on synthetic data, and save the model.
use ;
LeNet-style CNN training on CPU
The following utilizes the current API to define a training-ready CNN.
use ;
use Module;
API Overview
The library is designed around a few core concepts:
Tensor: The central data structure, anRc<RefCell<RawTensor>>, which holds data, shape, and gradient information. It allows for a mutable, shared structure to build the computation graph.TensorOps: A trait implemented forTensorthat provides the ergonomic, user-facing API for all operations (e.g.,tensor.add(&other),tensor.matmul(&weights)).nn::Module: A trait for building neural network layers (Linear,ReLU) and composing them into larger models (Sequential). It standardizes theforward()pass and parameter collection.- Optimizers (
Adam,SGD,Muon): Structures that take a list of model parameters and update their weights based on computed gradients duringstep(). - Vision Support: Implemented
Conv2dandMaxPool2dlayers to unlock the ability to build and train Convolutional Neural Networks (CNNs).
Running the Test Suite
Volta has an extensive test suite that validates the correctness of every operation and its gradient. To run the tests:
To run tests with BLAS acceleration enabled (on macOS):
Note: One test, misc_tests::test_adam_vs_sgd, is known to be flaky as it depends on the random seed and convergence speed. It may occasionally fail.
Roadmap
The next major steps for Volta are focused on expanding its capabilities to handle more complex models and improving performance.
- Vision Support: Implement
Conv2dandMaxPool2dlayers to unlock the ability to build and train Convolutional Neural Networks (CNNs). - GPU Acceleration: Integrate a backend for GPU computation (e.g.,
wgpufor cross-platform support or directmetalbindings for macOS) to drastically speed up training. - Performance Optimization: Implement SIMD for element-wise operations and further integrate optimized BLAS routines.
Outstanding Issues
- Device Argument Ignored: The
Device::GPUenum variant exists insrc/device.rs, but passing it toto_deviceinsrc/tensor.rscauses a panic/unimplemented error. - Serialization Fragility:
Sequentialrelies on string-key matching forstate_dict(e.g., "0.weight"). Renaming layers or changing architecture depth will break loading without helpful error messages. - Performance:
im2colimplementation insrc/nn/layers/conv.rsmaterializes the entire matrix in memory. Large batch sizes or high-resolution images will easily OOM even on high-end machines.
Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
License
This project is licensed under the MIT License - see the LICENSE file for details.