rustorch 0.6.29

Production-ready PyTorch-compatible deep learning library in Rust with special mathematical functions (gamma, Bessel, error functions), statistical distributions, Fourier transforms (FFT/RFFT), matrix decomposition (SVD/QR/LU/eigenvalue), automatic differentiation, neural networks, computer vision transforms, complete GPU acceleration (CUDA/Metal/OpenCL), SIMD optimizations, parallel processing, WebAssembly browser support, comprehensive distributed learning support, and performance validation
Documentation
# RusTorch Features

## โœจ Comprehensive Feature List

### ๐Ÿ”ฅ Core Tensor Operations
- **ๅŒ…ๆ‹ฌ็š„ใƒ†ใƒณใ‚ฝใƒซๆผ”็ฎ—**: ๆ•ฐๅญฆๆผ”็ฎ—ใ€ใƒ–ใƒญใƒผใƒ‰ใ‚ญใƒฃใ‚นใƒ†ใ‚ฃใƒณใ‚ฐใ€ใ‚คใƒณใƒ‡ใƒƒใ‚ฏใ‚นๆ“ไฝœใ€็ตฑ่จˆๆฉŸ่ƒฝ
- **Mathematical Functions**: Trigonometric, exponential, power, and activation functions
- **Special Functions**: Gamma, Bessel, error functions with high precision and PyTorch compatibility
  - โš ๏ธ **Note**: Some special functions have precision limitations (1e-3 to 1e-6 accuracy) - contributions welcome
- **Statistical Operations**: Mean, variance, std, median, quantile, covariance, correlation
- **Broadcasting Support**: Automatic shape compatibility and dimension expansion
- **Flexible Indexing**: Select operations, slicing, and advanced tensor manipulation

### ๐Ÿค– Neural Network Architecture
- **Transformer Architecture**: Complete transformer implementation with multi-head attention
- **Embedding Systems**: Word embeddings, positional encoding, sinusoidal encoding
- **Multi-Head Attention**: Self-attention and cross-attention mechanisms
- **Neural Network Layers**: Linear, Conv1d/2d/3d, ConvTranspose, RNN/LSTM/GRU, BatchNorm, Dropout, and more
- **Activation Functions**: `ReLU`, `Sigmoid`, `Tanh`, `Softmax`, `GELU`, `Swish`, `Mish`, `LeakyReLU`, `ELU`, `SELU`
- **Loss Functions**: `MSELoss`, `CrossEntropyLoss`, `BCELoss`, `HuberLoss`
- **Normalization**: BatchNorm, LayerNorm, GroupNorm, RMSNorm

### ๐Ÿงฎ Mathematical Computing
- **Matrix Decomposition**: Complete SVD, QR, LU decomposition and eigenvalue solver with PyTorch compatibility
- **Statistical Distributions**: Complete probability distributions (Normal, Gamma, Beta, etc.)
- **Automatic Differentiation**: Tape-based computational graph for gradient computation
- **Optimization Algorithms**: `SGD`, `Adam` + Learning rate schedulers

### โšก Performance Optimizations
- **Cross-Platform SIMD**: AVX2, SSE2, NEON support with automatic backend selection
- **Platform-Specific Optimizations**: OS and architecture-aware memory management
- **Hardware-Aware Computing**: CPU topology detection and optimal tile size calculation
- **Dynamic Execution Engine**: JIT compilation and runtime optimization
- **Unified Parallel Operations**: Trait-based parallel tensor operations with intelligent scheduling
- **Multi-threaded Processing**: Rayon-based parallel batch operations and reductions
- **GPU Integration**: CUDA/Metal/OpenCL support with automatic device selection
- **Advanced Memory Management**: Zero-copy operations, SIMD-aligned allocation, and memory pools

### ๐Ÿ–ผ๏ธ Computer Vision
- **Advanced Transformation Pipelines**: Caching, conditional transforms, built-in datasets (MNIST, CIFAR-10/100)
- **Safe Operations**: Type-safe tensor operations with comprehensive error handling
- **Shared Base Traits**: Reusable convolution and pooling base implementations

### ๐ŸŒ Cross-Platform Support
- **Rust Safety**: Memory safety and thread safety guarantees
- **WebAssembly Support**: Browser-compatible WASM bindings for client-side ML
- **Model Format Support**: Safetensors, ONNX inference, PyTorch state dict compatibility
- **Production Ready**: 968 tests passing (100% success rate), unified error handling system

## ๐Ÿ”ง Technical Specifications

### Tensor Operations
- **Basic operations**: `+`, `-`, `*`, `/`, `matmul()`
- **Mathematical functions**: `sin()`, `cos()`, `exp()`, `log()`, `sqrt()`, `pow()`, `sigmoid()`, `tanh()`
- **Special functions**: `gamma()`, `lgamma()`, `digamma()`, `erf()`, `erfc()`, `bessel_j()`, `bessel_y()`, `bessel_i()`, `bessel_k()`
- **Statistical operations**: `mean()`, `var()`, `std()`, `median()`, `quantile()`, `cumsum()`, `cov()`, `corrcoef()`
- **Matrix decomposition**: `svd()`, `qr()`, `lu()`, `eig()`, `symeig()` with PyTorch compatibility
- **Broadcasting**: `broadcast_to()`, `broadcast_with()`, `unsqueeze()`, `squeeze()`, `repeat()`
- **Indexing**: `select()`, advanced slicing and tensor manipulation
- **Shape manipulation**: `transpose()`, `reshape()`, `permute()`
- **Parallel operations**: Trait-based parallel processing with automatic SIMD acceleration
- **GPU operations**: CUDA/Metal/OpenCL unified kernel execution with automatic device selection
- **Memory optimization**: Zero-copy views, SIMD-aligned allocation, memory pools

### Neural Network Layers
- **Linear**: Fully connected layers
- **Conv2d**: 2D convolution layers
- **RNN/LSTM/GRU**: Recurrent neural networks (multi-layer & bidirectional)
- **Transformer**: Complete transformer architecture with encoder/decoder
- **Multi-Head Attention**: Self-attention and cross-attention mechanisms
- **Embedding**: Word embeddings, positional encoding, sinusoidal encoding
- **Normalization**: BatchNorm, LayerNorm, GroupNorm, RMSNorm
- **Dropout**: Standard and Alpha dropout layers
- **Pooling**: MaxPool2d, AvgPool2d

### Optimizers and Learning Rate Schedulers
- **Optimizers**: SGD (with momentum, weight decay, Nesterov), Adam, AdamW, RMSprop, AdaGrad
- **Learning Rate Schedulers**:
  - **StepLR**: Decay LR by gamma every step_size epochs
  - **ExponentialLR**: Exponential decay every epoch
  - **CosineAnnealingLR**: Cosine annealing with restarts
  - **ReduceLROnPlateau**: Reduce LR when metric plateaus
  - **MultiStepLR**: Decay at specific milestones
  - **WarmupScheduler**: Gradual warmup from base to target LR
  - **OneCycleLR**: 1cycle policy with cosine/linear annealing
  - **PolynomialLR**: Polynomial decay function

## ๐Ÿ“Š Feature Matrix

| Category | Feature | Status | Performance |
|----------|---------|--------|-------------|
| **Tensor Operations** | Basic Math | โœ… Production | 34K-2.3M ops/sec |
| **Matrix Decomposition** | SVD/QR/LU/Eig | โœ… Production | <10ฮผs for 32x32 |
| **Neural Networks** | Linear/Conv/RNN | โœ… Production | 15-60 inferences/sec |
| **Optimizers** | SGD/Adam/AdamW/RMSprop | โœ… Production | Full feature set |
| **LR Schedulers** | 8 scheduler types | โœ… Production | Advanced policies |
| **GPU Acceleration** | CUDA/Metal/OpenCL | โœ… Production | Auto device selection |
| **SIMD Optimization** | AVX2/SSE4.1 | โœ… Production | Automatic vectorization |
| **WebAssembly** | Browser Support | โœ… Production | Full feature set |
| **Model Formats** | PyTorch/ONNX/Safetensors | โœ… Production | Import/Export |
| **Memory Management** | Zero-copy/Pools | โœ… Production | Optimized allocation |

## ๐ŸŽฏ Special Functions Status

### High Precision (Ready for Production)
- โœ… **Gamma Functions**: ฮ“(x), ln ฮ“(x), ฯˆ(x), B(x,y)
- โœ… **Error Functions**: erf(x), erfc(x)
- โœ… **Bessel I Functions**: Iโ‚€(x), Iโ‚(x), Iโ‚™(x)
- โœ… **Bessel K Functions**: Kโ‚€(x), Kโ‚(x), Kโ‚™(x)

### Improving Precision (Contributions Welcome)
- โš ๏ธ **Bessel Y Functions**: Yโ‚€(x) has precision issues for some ranges
- โš ๏ธ **Inverse Error Functions**: erfinv/erfcinv could achieve better precision
- โš ๏ธ **Bessel J Functions**: Numerical stability improvements needed

## ๐Ÿš€ Upcoming Features

- **Enhanced GPU Kernels**: More optimized CUDA/Metal implementations
- **Distributed Training**: Multi-GPU and multi-node support
- **Quantization**: INT8/FP16 precision support
- **Model Optimization**: Graph optimization and fusion
- **Mobile Support**: iOS/Android deployment
- **Cloud Integration**: AWS/GCP/Azure native support

For implementation details and API documentation, see [API Documentation](https://docs.rs/rustorch).