docs.rs failed to build nnl-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

Visit the last successful build: nnl-0.1.6

NNL - Neural Network Library

nnl Logo

A high-performance neural network library for Rust with comprehensive GPU and CPU support.

Features

🚀 Multi-backend Support: NVIDIA CUDA, AMD ROCm/Vulkan, and optimized CPU execution
🎯 Automatic Hardware Detection: Seamlessly selects the best available compute backend
🧠 Advanced Optimizers: Adam, SGD, AdaGrad, RMSprop, AdamW, LBFGS, and more
🏗️ Flexible Architecture: Dense layers, CNN, batch normalization, dropout, and custom layers
💾 Model Persistence: Save/load models with metadata in multiple formats (Binary, JSON, MessagePack)
⚡ Production Ready: SIMD optimizations, parallel processing, and zero-copy operations
🔧 Comprehensive Training: Learning rate scheduling, early stopping, metrics tracking
🎛️ Fine-grained Control: Custom loss functions, weight initialization, and gradient computation

Quick Start

Add this to your Cargo.toml:

[dependencies]
nnl = "0.1.0"

Basic XOR Example

use nnl::prelude::*;

fn main() -> Result<()> {
    // Create a simple neural network
    let mut network = NetworkBuilder::new()
        .add_layer(LayerConfig::Dense {
            input_size: 2,
            output_size: 4,
            activation: Activation::ReLU,
            use_bias: true,
            weight_init: WeightInit::Xavier,
        })
        .add_layer(LayerConfig::Dense {
            input_size: 4,
            output_size: 1,
            activation: Activation::Sigmoid,
            use_bias: true,
            weight_init: WeightInit::Xavier,
        })
        .loss(LossFunction::BinaryCrossEntropy)
        .optimizer(OptimizerConfig::Adam { learning_rate: 0.01 })
        .build()?;

    // Training data for XOR problem
    let inputs = Tensor::from_slice(&[
        0.0, 0.0,  // XOR(0,0) = 0
        0.0, 1.0,  // XOR(0,1) = 1
        1.0, 0.0,  // XOR(1,0) = 1
        1.0, 1.0,  // XOR(1,1) = 0
    ], &[4, 2])?;

    let targets = Tensor::from_slice(&[0.0, 1.0, 1.0, 0.0], &[4, 1])?;

    // Train the network
    network.train(&inputs, &targets, 1000)?;

    // Make predictions
    let test_input = Tensor::from_slice(&[1.0, 0.0], &[1, 2])?;
    let prediction = network.forward(&test_input)?;
    println!("XOR(1,0) = {:.4}", prediction.to_vec()?[0]);

    Ok(())
}

Installation

CPU-only (default)

[dependencies]
nnl = "0.1.0"

With GPU Support

[dependencies]
nnl = { version = "0.1.0", features = ["cuda"] }  # NVIDIA CUDA
# or
nnl = { version = "0.1.0", features = ["vulkan"] } # Vulkan (AMD/Intel/NVIDIA)
# or
nnl = { version = "0.1.0", features = ["all-backends"] } # All GPU backends

System Requirements

Rust: 1.70 or later
CPU: Any modern x86_64 or ARM64 processor
GPU (optional):
- CUDA: NVIDIA GPU with compute capability 3.5+, CUDA 11.0+
- Vulkan: Any Vulkan 1.2+ compatible GPU
- ROCm: AMD GPU with ROCm 4.0+ (experimental)

Examples

Run the included examples to see the library in action:

# Basic XOR problem (CPU)
cargo run --example xor

# XOR with GPU acceleration
cargo run --example xor_gpu --features cuda

# MNIST digit classification
cargo run --example mnist

# Convolutional Neural Network
cargo run --example simple_cnn

# CNN with GPU support
cargo run --example simple_cnn_gpu --features cuda

Available Examples

xor.rs - Solve XOR problem with a simple neural network
mnist.rs - MNIST handwritten digit classification
simple_cnn.rs - Convolutional neural network example
GPU variants: *_gpu.rs - Same examples with GPU acceleration

Core Concepts

Device Management

// Automatic device selection (CPU/GPU)
let device = Device::auto_select()?;

// Specific device types
let cpu_device = Device::cpu()?;
let cuda_device = Device::cuda(0)?;  // GPU 0
let vulkan_device = Device::vulkan()?;

Tensors

// Create tensors
let zeros = Tensor::zeros(&[3, 4])?;
let ones = Tensor::ones(&[2, 2])?;
let from_data = Tensor::from_slice(&[1.0, 2.0, 3.0], &[3])?;

// Tensor operations
let a = Tensor::randn(&[2, 3])?;
let b = Tensor::randn(&[2, 3])?;
let result = a.add(&b)?;  // Element-wise addition
let matmul = a.matmul(&b.transpose(&[1, 0])?)?;  // Matrix multiplication

Network Architecture

let network = NetworkBuilder::new()
    .add_layer(LayerConfig::Dense {
        input_size: 784,
        output_size: 128,
        activation: Activation::ReLU,
        use_bias: true,
        weight_init: WeightInit::Xavier,
    })
    .add_layer(LayerConfig::Dropout { dropout_rate: 0.2 })
    .add_layer(LayerConfig::Dense {
        input_size: 128,
        output_size: 10,
        activation: Activation::Softmax,
        use_bias: true,
        weight_init: WeightInit::Xavier,
    })
    .loss(LossFunction::CategoricalCrossEntropy)
    .optimizer(OptimizerConfig::Adam {
        learning_rate: 0.001,
        beta1: 0.9,
        beta2: 0.999,
        epsilon: 1e-8,
        weight_decay: Some(1e-4),
        amsgrad: false,
    })
    .build()?;

Training with Advanced Features

let config = TrainingConfig {
    epochs: 100,
    batch_size: 32,
    verbose: true,
    early_stopping_patience: 10,
    early_stopping_threshold: 1e-4,
    lr_schedule: Some(LearningRateSchedule::StepLR {
        step_size: 30,
        gamma: 0.1
    }),
    validation_split: 0.2,
    shuffle: true,
    random_seed: Some(42),
};

let history = network.train(&train_data, &train_labels, &config)?;
println!("Best accuracy: {:.4}", history.best_accuracy());

Model Persistence

// Save model
save_model(&network, "my_model.bin", ModelFormat::Binary)?;

// Load model
let loaded_network = load_model("my_model.bin")?;

// Save with metadata
let metadata = ModelMetadata {
    name: "MNIST Classifier".to_string(),
    description: "CNN for digit classification".to_string(),
    training_info: Some(training_info),
    ..Default::default()
};
save_model_with_metadata(&network, "model_with_meta.json", ModelFormat::Json, &metadata)?;

Performance

Benchmarks

Performance comparison on common tasks (Intel i7-10700K, RTX 3080):

Task	CPU (8 threads)	CUDA GPU	Speedup
Dense 1000x1000 MatMul	12.5ms	0.8ms	15.6x
Conv2D 224x224x64	145ms	8.2ms	17.7x
MNIST Training (60k samples)	45s	3.2s	14.1x

Optimization Tips

Use appropriate batch sizes: 32-256 for GPU, 8-32 for CPU
Enable CPU optimizations: Use features = ["cpu-optimized"] for Intel MKL
Memory management: Call network.zero_grad() regularly to free unused memory
Data loading: Use parallel data loading for large datasets
Mixed precision: Enable f16 on supported GPUs for 2x speedup

Feature Flags

Feature	Description	Example
`default`	CPU-optimized backend	`nnl = "0.1.0"`
`cuda`	NVIDIA CUDA support	`features = ["cuda"]`
`vulkan`	Vulkan compute support	`features = ["vulkan"]`
`rocm`	AMD ROCm support (experimental)	`features = ["rocm"]`
`cpu-optimized`	Intel MKL/OpenBLAS acceleration	`features = ["cpu-optimized"]`
`all-backends`	All GPU backends	`features = ["all-backends"]`
`examples`	Example binaries	`features = ["examples"]`

Troubleshooting

Common Issues

CUDA not found

# Install CUDA toolkit 11.0+
# Add to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Vulkan not available

# Install Vulkan drivers
sudo apt install vulkan-tools vulkan-loader-dev  # Ubuntu/Debian
# Verify: vulkaninfo

Slow CPU performance

# Enable CPU optimizations
nnl = { version = "0.1.0", features = ["cpu-optimized"] }

Out of memory on GPU

Reduce batch size
Use gradient accumulation
Enable mixed precision training

API Documentation

For detailed API documentation, see docs.rs/nnl.

Key modules:

tensor - Tensor operations and data structures
network - Neural network building and training
layers - Layer implementations and configurations
optimizers - Optimization algorithms
device - Device management and backend selection

Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with tests
Run cargo test and cargo clippy
Submit a pull request

For major changes, please open an issue first to discuss the proposed changes.

Development Setup

git clone https://github.com/hotplugindev/NNL.git
cd NNL
cargo build
cargo test
cargo run --example xor

Roadmap

Distributed Training: Multi-GPU and multi-node support
Mobile Deployment: ARM optimization and model quantization
Web Assembly: Browser-based inference
Model Zoo: Pre-trained models for common tasks
Auto-ML: Neural architecture search
Graph Optimization: Operator fusion and memory optimization

License

This project is dual-licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Acknowledgments

Inspired by PyTorch and TensorFlow APIs
Built on excellent Rust ecosystem crates: ndarray, rayon, vulkano, cudarc
Thanks to the Rust ML community and all contributors

Questions? Check out our FAQ or open an issue.

nnl 0.1.0