SciRS2 Autograd

Production-Ready Automatic Differentiation for Rust (v0.1.5)

A high-performance automatic differentiation library for SciRS2, providing functionality comparable to PyTorch/TensorFlow's autograd systems with native Rust performance and safety guarantees.

⚠️ SciRS2 POLICY Migration: This module is currently being updated to follow the SciRS2 POLICY - migration from direct rand:: and ndarray:: usage to scirs2-core abstractions is in progress.

✨ Features

Core Automatic Differentiation

Reverse-mode AD: Efficient gradient computation for machine learning workloads
Dynamic Graphs: Runtime graph construction with flexible control flow support
Higher-order Derivatives: Second and higher-order gradients with numerical stability
Memory Optimization: Gradient checkpointing, memory pooling, and smart caching

Mathematical Operations

Comprehensive Linear Algebra: Matrix decompositions (QR, LU, SVD, Cholesky) with gradients
Matrix Functions: Inverse, determinant, exponential, logarithm, power operations
Numerically Stable Implementations: Robust gradient computation for large matrices
Broadcasting: NumPy-style tensor broadcasting for element-wise operations

Neural Network Infrastructure

Activation Functions: ReLU variants, Sigmoid, Tanh, Softmax, Swish, GELU, Mish
Loss Functions: MSE, cross-entropy, sparse categorical cross-entropy
Convolution Layers: 2D convolutions, transposed convolutions, pooling operations
Optimization: SGD, Adam, AdaGrad, AdamW with learning rate scheduling

Performance & Integration

SIMD Acceleration: Vectorized operations for enhanced performance
Parallel Processing: Multi-threaded computation with work-stealing thread pool
BLAS Support: Optional acceleration with OpenBLAS, Intel MKL
SciRS2 Integration: Seamless interoperability with the broader SciRS2 ecosystem

📦 Installation

Add to your Cargo.toml:

[dependencies]
scirs2-autograd = "0.1.5"

Optional Features

Enable performance optimizations and additional backends:

[dependencies]
scirs2-autograd = { version = "0.1.5", features = ["blas", "simd"] }

Available Features:

blas - BLAS acceleration for linear algebra operations
openblas - OpenBLAS backend
intel-mkl - Intel MKL backend for maximum performance
simd - SIMD acceleration for element-wise operations

🚀 Quick Start

Basic Automatic Differentiation

use scirs2_autograd::{run, tensor_ops as T};

// Compute gradients of z = 2x² + 3y + 1
run(|ctx| {
    let x = ctx.placeholder("x", &[]);
    let y = ctx.placeholder("y", &[]);
    let z = 2.0 * x * x + 3.0 * y + 1.0;

    // First-order gradients
    let dz_dy = &T::grad(&[z], &[y])[0];
    println!("dz/dy = {:?}", dz_dy.eval(ctx));  // => 3.0

    let dz_dx = &T::grad(&[z], &[x])[0];
    let x_val = scirs2_autograd::ndarray::arr0(2.0);
    let result = ctx.evaluator()
        .push(dz_dx)
        .feed(x, x_val.view())
        .run()[0];
    println!("dz/dx at x=2 = {:?}", result);  // => 8.0

    // Higher-order derivatives  
    let d2z_dx2 = &T::grad(&[dz_dx], &[x])[0];
    println!("d²z/dx² = {:?}", d2z_dx2.eval(ctx));  // => 4.0
});

Neural Network Training

use scirs2_autograd::{tensor_ops::*, ndarray_ext, VariableEnvironment};
use scirs2_autograd::optimizers::adam::Adam;

// Build a 2-layer MLP for classification
let mut env = VariableEnvironment::new();
let mut rng = ndarray_ext::ArrayRng::<f32>::default();

// Initialize network parameters
// Note: Use ndarray_ext::zeros for NdArray, not tensor_ops::zeros which returns Tensor
env.name("w1").set(rng.glorot_uniform(&[784, 128]));
env.name("b1").set(ndarray_ext::zeros(&[1, 128]));
env.name("w2").set(rng.glorot_uniform(&[128, 10]));
env.name("b2").set(ndarray_ext::zeros(&[1, 10]));

// Setup Adam optimizer
let adam = Adam::default(
    "adam_optimizer", 
    env.default_namespace().current_var_ids(), 
    &mut env
);

// Training loop
for epoch in 0..100 {
    env.run(|ctx| {
        // Input placeholders
        let x = ctx.placeholder("x", &[-1, 784]);  // batch_size × 784
        let y_true = ctx.placeholder("y", &[-1]);   // batch_size

        // Load model parameters
        let w1 = ctx.variable("w1");
        let b1 = ctx.variable("b1");
        let w2 = ctx.variable("w2");
        let b2 = ctx.variable("b2");

        // Forward pass
        let hidden = relu(matmul(x, w1) + b1);
        let logits = matmul(hidden, w2) + b2;

        // Compute loss
        let loss = reduce_mean(
            sparse_softmax_cross_entropy(logits, &y_true),
            &[0],
            false
        );

        // Backpropagation
        let params = [w1, b1, w2, b2];
        let gradients = &grad(&[loss], &params);

        // Parameter updates would be performed here with actual training data
        // adam.update(&params, gradients, ctx, feeder);
    });
}

⚠️ Important API Notes

`ndarray_ext` vs `tensor_ops` Functions

There are two sets of array creation functions with different return types:

Module	Function	Returns	Use For
`ndarray_ext::zeros`	`zeros(&[shape])`	`NdArray<T>`	Variable initialization, data storage
`tensor_ops::zeros`	`zeros(&shape, ctx)`	`Tensor<F>`	Computation graph operations

Common Mistake:

// ❌ WRONG: tensor_ops::zeros returns Tensor, but .set() expects NdArray
use scirs2_autograd::tensor_ops::*;
env.name("b1").set(zeros(&[1, 128], ctx));  // Type error!

// ✅ CORRECT: Use ndarray_ext::zeros for variable initialization
use scirs2_autograd::ndarray_ext;
env.name("b1").set(ndarray_ext::zeros(&[1, 128]));

Rule of Thumb:

Use ndarray_ext::* for data (variable initialization, feeding values)
Use tensor_ops::* for computations (inside env.run(|ctx| { ... }))

🎯 Advanced Features

Mathematical Robustness

Higher-Order Derivatives: Efficient Hessian computation for advanced optimization
Numerical Stability: Carefully implemented gradients for matrix decompositions
Large Matrix Support: Optimized algorithms for high-dimensional computations
Custom Operations: Extensible framework for user-defined differentiable operations

Performance Engineering

Memory Management: Smart gradient checkpointing reduces memory usage by 50-80%
Computation Graph Optimization: Automatic fusion and simplification
SIMD & Parallelization: Multi-core acceleration with work-stealing scheduler
Zero-Copy Operations: Tensor views and in-place operations minimize allocations

Developer Experience

Comprehensive Testing: 404+ tests ensure reliability and correctness
Rich Debugging: Graph visualization and execution tracing tools
Flexible APIs: Support for both eager and graph-based execution models
SciRS2 Integration: Seamless interoperability across the scientific computing stack

Gradient Checkpointing

Gradient checkpointing is a memory optimization technique that trades additional computation time for reduced memory usage during backpropagation. This is especially useful for training large models under memory constraints.

How It Works

During standard backpropagation, all intermediate activations must be stored to compute gradients, which can lead to high memory usage in deep networks. Gradient checkpointing selectively discards intermediate activations during the forward pass and recomputes them during the backward pass as needed.

Benefits

Significantly reduced memory usage (typically 50-80% reduction)
Enables training of deeper/larger models that would otherwise not fit in memory
Flexible strategies to balance memory usage vs. computation time

Usage Options

use scirs2_autograd::{run, tensor_ops as T};

run(|ctx| {
    // 1. Basic checkpointing of individual tensors
    let input = T::ones(&[128, 128], ctx);
    let w = T::ones(&[128, 128], ctx);
    let hidden = T::matmul(&input, &w);
    let hidden_checkpoint = T::checkpoint(&hidden);  // This tensor will be recomputed during backprop
    
    // 2. Adaptive checkpointing based on memory threshold (in bytes)
    let large_tensor = T::matmul(&input, &w);
    let adaptive_checkpoint = T::adaptive_checkpoint(&large_tensor, 1_000_000);  // 1MB threshold
    
    // 3. Checkpoint groups for multi-output operations
    let checkpoint_group = T::CheckpointGroup::new(ctx);
    let (output1, output2) = checkpoint_group.checkpoint_fn_flex2(&[&input, &w], |inputs| {
        let a = T::matmul(&inputs[0], &inputs[1]);
        let b = T::transpose(&a, &[1, 0]);
        (a, b)  // Both tensors will be checkpointed together
    });
});

Profiling Checkpoint Performance

You can measure the memory savings and performance impact of your checkpointing strategy:

// Start tracking memory usage
T::CheckpointProfiler::start_tracking();

// Your model with checkpointing
// ... (model code with checkpoint operations)

// Evaluate performance
println!("Memory saved: {} KB", T::CheckpointProfiler::memory_saved() / 1024);
println!("Checkpoint operations: {}", T::CheckpointProfiler::checkpoint_count());

// Reset for next test
T::CheckpointProfiler::reset_statistics();

Optimization Strategies

Basic Strategy: Checkpoint every N layers (e.g., every other layer)
Adaptive Strategy: Use automatic thresholds based on tensor size
Targeted Strategy: Manually checkpoint only the largest tensors
Segment Strategy: Checkpoint entire computation segments together

📈 Performance & Reliability

Test Coverage: 404 passing tests, 0 failures
Memory Efficiency: Up to 80% reduction with gradient checkpointing
Numerical Stability: Robust implementations for large-scale computations
Performance: SIMD and multi-threading optimizations throughout

🤝 Contributing & Support

Documentation: docs.rs/scirs2-autograd
Repository: github.com/cool-japan/scirs
Issues: Report bugs and request features on GitHub
Community: Join discussions in the SciRS2 community

🚀 Production Readiness

SciRS2 Autograd v0.1.5 is Stable Release and is production-ready:

✅ Stable API: No breaking changes expected before v1.0
✅ Comprehensive Testing: All core functionality thoroughly tested
✅ Performance Optimized: SIMD, parallelization, and memory optimizations
✅ Documentation Complete: Full API documentation with examples
✅ Integration Ready: Seamless SciRS2 ecosystem compatibility

License

This project is Licensed under the Apache License 2.0. See LICENSE for details.

You can choose to use either license. See the LICENSE file for details.

scirs2-autograd 0.2.0