# Volta ⚡
[](https://github.com/rlarson20/volta/actions)
[](https://crates.io/crates/volta)
[](https://opensource.org/licenses/MIT)
Volta is a minimal deep learning and automatic differentiation library built from scratch in pure Rust, heavily inspired by PyTorch. It provides a dynamic computation graph, NumPy-style broadcasting, and common neural network primitives.
This project is an educational endeavor to demystify the inner workings of modern autograd engines. It prioritizes correctness, clarity, and a clean API over raw performance, while still providing hooks for hardware acceleration.
## Key Features
- **Dynamic Computation Graph:** Build and backpropagate through graphs on the fly, just like PyTorch.
- **Reverse-Mode Autodiff:** A powerful `backward()` method for efficient end-to-end gradient calculation.
- **Rich Tensor Operations:** A comprehensive set of unary, binary, reduction, and matrix operations via an ergonomic `TensorOps` trait.
- **Broadcasting:** Full NumPy-style broadcasting support for arithmetic operations.
- **Neural Network Layers:** `Linear`, `Conv2d`, `MaxPool2d`, `Flatten`, `ReLU`, `Sigmoid`, `Tanh`.
- **Optimizers:** `SGD` (w/ Momentum), `Adam` (w/ bias correction), and `Muon` (Momentum Orthogonal).
- **IO System:** Save and load model weights (state dicts) via `bincode`.
- **BLAS Acceleration (macOS):** Optional performance boost for matrix multiplication via Apple's Accelerate framework.
- **Validation-Focused:** Includes a robust numerical gradient checker to ensure the correctness of all implemented operations.
## Project Status
This library is functional for training MLPs and CNNs on CPU. It features a verified autograd engine and correctly implemented `im2col` convolutions.
- ✅ **What's Working:** Autograd, Conv2d/Linear layers, Optimizers (including Muon), DataLoaders, Serialization.
- ⚠️ **What's in Progress:** Performance is not yet a primary focus. While BLAS acceleration is available for macOS matrix multiplication, most operations use naive loops.
- ❌ **What's Missing:**
- **GPU Support:** Currently CPU-only.
## Installation
Add Volta to your `Cargo.toml`:
```toml
[dependencies]
volta = "0.1.0"
```
### Enabling BLAS on macOS
For a significant performance boost in matrix multiplication on macOS, enable the `accelerate` feature:
```toml
[dependencies]
volta = { version = "0.1.0", features = ["accelerate"] }
```
## Examples:
### Training an MLP
Here's how to define a simple Multi-Layer Perceptron (MLP), train it on synthetic data, and save the model.
```rust
use volta::{nn::*, tensor::*, Adam, Sequential, TensorOps, io};
fn main() {
// 1. Define a simple model: 2 -> 8 -> 1
let model = Sequential::new(vec![
Box::new(Linear::new(2, 8, true)),
Box::new(ReLU),
Box::new(Linear::new(8, 1, true)),
]);
// 2. Create synthetic data
let x_data = vec![0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0];
let x = new_tensor(x_data, &[4, 2], false); // Batch size 4, 2 features
let y_data = vec![0.0, 1.0, 1.0, 0.0];
let y = new_tensor(y_data, &[4], false); // Flattened targets
// 3. Set up the optimizer
let params = model.parameters();
let mut optimizer = Adam::new(params, 0.1, (0.9, 0.999), 1e-8);
// 4. Training loop
println!("Training a simple MLP to learn XOR...");
for epoch in 0..=300 {
optimizer.zero_grad();
let pred = model.forward(&x).reshape(&[4]); //alignment
let loss = mse_loss(&pred, &y);
if epoch % 20 == 0 {
println!("Epoch {}: loss = {:.6}", epoch, loss.borrow().data[0]);
}
loss.backward();
optimizer.step();
}
// 5. Save and Load State Dict
let state = model.state_dict();
io::save_state_dict(&state, "model.bin").expect("Failed to save");
// Verify loading
let mut new_model = Sequential::new(vec![
Box::new(Linear::new(2, 8, true)),
Box::new(ReLU),
Box::new(Linear::new(8, 1, true)),
]);
let loaded_state = io::load_state_dict("model.bin").expect("Failed to load");
new_model.load_state_dict(&loaded_state);
}
```
### LeNet-style CNN training on CPU
The following utilizes the current API to define a training-ready CNN.
```rust
use volta::{Sequential, Conv2d, MaxPool2d, Flatten, Linear, ReLU, new_tensor, Adam};
use volta::nn::Module;
fn main() {
// 1. Define Model
let model = Sequential::new(vec![
// Input: 1x28x28
Box::new(Conv2d::new(1, 6, 5, 1, 2, true)), // Padding 2
Box::new(ReLU),
Box::new(MaxPool2d::new(2, 2, 0)),
// Feature map size here: 6x14x14
Box::new(Flatten::new()),
Box::new(Linear::new(6 * 14 * 14, 10, true)),
]);
// 2. Data & Optimizer
let input = volta::randn(&[4, 1, 28, 28]); // Batch 4
let target = volta::randn(&[4, 10]); // Dummy targets
let params = model.parameters();
let mut optim = Adam::new(params, 1e-3, (0.9, 0.999), 1e-8);
// 3. Training Step
optim.zero_grad();
let output = model.forward(&input);
let loss = volta::mse_loss(&output, &target);
loss.backward();
optim.step();
println!("Loss: {:?}", loss.borrow().data[0]);
}
```
## API Overview
The library is designed around a few core concepts:
- **`Tensor`**: The central data structure, an `Rc<RefCell<RawTensor>>`, which holds data, shape, and gradient information. It allows for a mutable, shared structure to build the computation graph.
- **`TensorOps`**: A trait implemented for `Tensor` that provides the ergonomic, user-facing API for all operations (e.g., `tensor.add(&other)`, `tensor.matmul(&weights)`).
- **`nn::Module`**: A trait for building neural network layers (`Linear`, `ReLU`) and composing them into larger models (`Sequential`). It standardizes the `forward()` pass and parameter collection.
- **Optimizers (`Adam`, `SGD`, `Muon`)**: Structures that take a list of model parameters and update their weights based on computed gradients during `step()`.
- **Vision Support:** Implemented `Conv2d` and `MaxPool2d` layers to unlock the ability to build and train Convolutional Neural Networks (CNNs).
## Running the Test Suite
Volta has an extensive test suite that validates the correctness of every operation and its gradient. To run the tests:
```bash
cargo test -- --nocapture
```
To run tests with BLAS acceleration enabled (on macOS):
```bash
cargo test --features accelerate -- --nocapture
```
_Note: One test, `misc_tests::test_adam_vs_sgd`, is known to be flaky as it depends on the random seed and convergence speed. It may occasionally fail._
## Roadmap
The next major steps for Volta are focused on expanding its capabilities to handle more complex models and improving performance.
1. **Vision Support:** Implement `Conv2d` and `MaxPool2d` layers to unlock the ability to build and train Convolutional Neural Networks (CNNs).
2. **GPU Acceleration:** Integrate a backend for GPU computation (e.g., `wgpu` for cross-platform support or direct `metal` bindings for macOS) to drastically speed up training.
3. **Performance Optimization:** Implement SIMD for element-wise operations and further integrate optimized BLAS routines.
### Outstanding Issues
- **Device Argument Ignored**: The `Device::GPU` enum variant exists in `src/device.rs`, but passing it to `to_device` in `src/tensor.rs` causes a panic/unimplemented error.
- **Serialization Fragility**: `Sequential` relies on string-key matching for `state_dict` (e.g., "0.weight"). Renaming layers or changing architecture depth will break loading without helpful error messages.
- **Performance**: `im2col` implementation in `src/nn/layers/conv.rs` materializes the entire matrix in memory. Large batch sizes or high-resolution images will easily OOM even on high-end machines.
## Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](https://github.com/rlarson20/volta/issues).
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.