Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
NNL - Neural Network Library

A high-performance neural network library for Rust with CPU and Vulkan GPU support.
Features
- 🚀 Dual Backend Support: Optimized CPU execution and Vulkan compute shaders
- 🎯 Automatic Hardware Detection: Seamlessly selects between CPU and Vulkan GPU
- 🧠 Advanced Optimizers: Adam, SGD, and other optimization algorithms
- 🏗️ Flexible Architecture: Dense layers, CNN, batch normalization, dropout, and custom layers
- 💾 Model Persistence: Save/load models with metadata in multiple formats (Binary, JSON, MessagePack)
- ⚡ Production Ready: SIMD optimizations, parallel processing, and zero-copy operations
- 🔧 Comprehensive Training: Learning rate scheduling, early stopping, metrics tracking
- 🎛️ Fine-grained Control: Custom loss functions, weight initialization, and gradient computation
Quick Start
Add this to your Cargo.toml:
[]
= "0.1.0"
Basic XOR Example
use *;
Installation
CPU-only
[]
= "0.1.0"
With OpenBLAS optimization
[]
= { = "0.1.0", = ["cpu-optimized"] }
With Intel MKL optimization
[]
= { = "0.1.0", = ["intel-mkl"] }
System Requirements
- Rust: 1.70 or later (edition 2024)
- CPU: Any modern x86_64 or ARM64 processor
- GPU (optional): Any Vulkan 1.2+ compatible GPU (AMD, Intel, NVIDIA)
- OS: Linux, Windows, macOS
GPU Support
NNL uses Vulkan compute shaders for GPU acceleration, which works on:
- AMD GPUs: Radeon RX 400 series and newer
- NVIDIA GPUs: GTX 900 series and newer
- Intel GPUs: Arc series and modern integrated graphics
Examples
Run the included examples to see the library in action:
# Basic XOR problem (CPU)
# XOR with GPU acceleration (if Vulkan GPU available)
# MNIST digit classification
# MNIST with GPU
# Convolutional Neural Network
# CNN with GPU support
# Small MNIST examples for testing
Available Examples
xor.rs- Solve XOR problem with a simple neural network (CPU)xor_gpu.rs- XOR with Vulkan GPU accelerationmnist.rs- MNIST handwritten digit classification (CPU)mnist_gpu.rs- MNIST with GPU accelerationmnist_small.rs- Smaller MNIST dataset for testing (CPU)mnist_small_gpu.rs- Small MNIST with GPUsimple_cnn.rs- Convolutional neural network (CPU)simple_cnn_gpu.rs- CNN with GPU acceleration
Core Concepts
Device Management
// Automatic device selection (prefers GPU if available, falls back to CPU)
let device = auto_select?;
// Specific device types
let cpu_device = cpu?;
let vulkan_device = vulkan?; // May fail if no Vulkan GPU available
// Check device capabilities
println!;
println!;
Tensors
// Create tensors (uses auto-selected device)
let zeros = zeros?;
let ones = ones?;
let from_data = from_slice?;
// Create tensors on specific device
let device = vulkan?;
let gpu_tensor = from_slice_on_device?;
// Tensor operations
let a = randn?;
let b = randn?;
let result = a.add?; // Element-wise addition
let matmul = a.matmul?; // Matrix multiplication
Network Architecture
let network = new
.add_layer
.add_layer
.add_layer
.loss
.optimizer
.device // Automatically choose best device
.build?;
Training with Advanced Features
let config = TrainingConfig ;
let history = network.train?;
println!;
Model Persistence
use ;
// Save model
save_model?;
// Load model
let loaded_network = load_model?;
// Save with metadata
let metadata = ModelMetadata ;
save_model?;
Performance
Benchmarks
Performance comparison on common tasks (Intel i7-10700K, RTX 3060 via Vulkan):
| Task | CPU (8 threads) | Vulkan GPU | Speedup |
|---|---|---|---|
| Dense 1000x1000 MatMul | 12.5ms | 3.2ms | 3.9x |
| Conv2D 224x224x64 | 145ms | 28ms | 5.2x |
| MNIST Training (60k samples) | 45s | 18s | 2.5x |
Note: Performance varies significantly based on GPU model and driver quality. Vulkan performance on NVIDIA may be lower than native CUDA.
Optimization Tips
- Use appropriate batch sizes: 32-128 for GPU, 8-32 for CPU
- Enable CPU optimizations: Use
features = ["cpu-optimized"]for OpenBLAS - Intel CPUs: Use
features = ["intel-mkl"]for maximum CPU performance - Memory management: Call
network.zero_grad()regularly to free unused memory - Data loading: Use parallel data loading for large datasets
- GPU memory: Monitor GPU memory usage, reduce batch size if running out
Feature Flags
| Feature | Description | Dependencies |
|---|---|---|
default |
CPU-optimized + examples | ["cpu-optimized", "examples"] |
cpu-optimized |
OpenBLAS acceleration | openblas-src |
intel-mkl |
Intel MKL acceleration | intel-mkl-src |
examples |
Example binaries and utilities | clap, image |
Note: Vulkan support is always enabled and does not require a feature flag.
Troubleshooting
Common Issues
Vulkan not available
# Install Vulkan drivers and loader
# Ubuntu/Debian:
# Verify Vulkan works:
# For NVIDIA GPUs, ensure latest drivers are installed
# For AMD GPUs on Linux, ensure AMDGPU driver is loaded
Slow CPU performance
# Enable OpenBLAS optimizations
= { = "0.1.0", = ["cpu-optimized"] }
# Or for Intel CPUs, use MKL:
= { = "0.1.0", = ["intel-mkl"] }
Out of memory on GPU
- Reduce batch size in
TrainingConfig - Use smaller model architectures
- Monitor GPU memory usage with
nvidia-smior similar tools
Compilation errors with MKL
# Ensure Intel MKL is properly installed
# Or switch to OpenBLAS:
Poor GPU performance
- Ensure you're using
Device::vulkan()orDevice::auto_select() - Check that Vulkan drivers are up to date
- Some operations may not be optimized for GPU yet
- Consider using CPU with optimizations for small models
API Documentation
For detailed API documentation, see docs.rs/nnl.
Key modules:
tensor- Tensor operations and data structuresnetwork- Neural network building and traininglayers- Layer implementations and configurationsoptimizers- Optimization algorithmsdevice- Device management and backend selectionio- Model saving and loading
Contributing
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Run
cargo testandcargo clippy - Submit a pull request
For major changes, please open an issue first to discuss the proposed changes.
Development Setup
# Test GPU functionality (requires Vulkan)
Roadmap
- CUDA Support: Native NVIDIA CUDA backend for better performance
- ROCm Support: AMD ROCm backend for compute-focused workloads
- Distributed Training: Multi-GPU support
- Mobile Deployment: ARM optimization and model quantization
- Web Assembly: Browser-based inference
- Model Zoo: Pre-trained models for common tasks
- Auto-ML: Neural architecture search
- Graph Optimization: Operator fusion and memory optimization
Limitations
- CUDA: Not yet supported (Vulkan used for NVIDIA GPUs)
- ROCm: Not yet supported (Vulkan used for AMD GPUs)
- Distributed Training: Single device only
- Model Formats: Limited compared to PyTorch/TensorFlow
- Layer Types: Growing but not comprehensive
- Performance: Vulkan overhead may impact small models
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on excellent Rust ecosystem crates:
ndarray,rayon,vulkano - Inspired by PyTorch and TensorFlow APIs
- Thanks to the Rust ML community and all contributors
Questions? Open an issue on GitHub.