# CUDA-WASM Neural Integration with ruv-FANN




## 🚀 Overview
This integration provides seamless GPU acceleration for neural network operations by combining the power of **CUDA-WASM transpiler** with **ruv-FANN neural networks**. Experience **5x+ speedup** for neural network computations with automatic fallback to CPU when GPU is unavailable.
### Key Features
- ⚡ **5x+ Performance Improvement** - GPU-accelerated neural operations
- 🔄 **Automatic CUDA-to-WGSL Transpilation** - Run CUDA kernels in web browsers
- 🧠 **Neural Network Integration** - Seamless ruv-FANN compatibility
- 🌐 **Cross-Platform** - Web, Node.js, and native support
- 📊 **Performance Monitoring** - Real-time metrics and optimization
- 🔧 **Memory Management** - Efficient GPU-CPU data transfer
- 🎯 **Batch Processing** - Optimize bulk operations
- ⚙️ **Custom Kernels** - Write and execute custom CUDA code
- 🛡️ **Automatic Fallback** - CPU execution when GPU unavailable
- 📝 **TypeScript Support** - Full type definitions included
## 🏗️ Architecture
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ ruv-FANN │ │ Neural Bridge │ │ CUDA-WASM │
│ Neural Networks │◄──►│ Integration │◄──►│ Transpiler │
└─────────────────┘ └──────────────────┘ └─────────────────┘
▲ ▲ ▲
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ CPU Training │ │ Memory Manager │ │ WebGPU/CUDA │
│ & Inference │ │ & Performance │ │ Execution │
└─────────────────┘ └──────────────────┘ └─────────────────┘
```
## 🚀 Quick Start
### Installation
```bash
# Add to your Rust project
cargo add cuda-rust-wasm --features=neural-integration
# For web development
npm install cuda-wasm-neural
```
### Basic Usage
```rust
use cuda_rust_wasm::{NeuralBridge, BridgeConfig, NeuralOperation, ActivationFunction};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize neural integration
cuda_rust_wasm::init_neural_integration()?;
// Create GPU-accelerated neural bridge
let bridge = NeuralBridge::new()?;
// Check GPU availability
if bridge.is_gpu_available() {
println!("GPU acceleration enabled!");
if let Some(info) = bridge.get_device_info() {
println!("Device: {} ({})", info.name, info.vendor);
}
}
// Execute matrix multiplication on GPU
let a = vec![1.0, 2.0, 3.0, 4.0]; // 2x2 matrix
let b = vec![5.0, 6.0, 7.0, 8.0]; // 2x2 matrix
let mut input_data = a;
input_data.extend(b);
let operation = NeuralOperation::MatrixMultiply {
a_rows: 2,
a_cols: 2,
b_cols: 2,
};
let result = bridge.execute_neural_operation(operation, &input_data)?;
println!("Result: {:?}", result);
// Apply ReLU activation function
let input = vec![-2.0, -1.0, 0.0, 1.0, 2.0];
let activation_op = NeuralOperation::ActivationFunction {
function: ActivationFunction::ReLU,
size: input.len(),
};
let activated = bridge.execute_neural_operation(activation_op, &input)?;
println!("ReLU result: {:?}", activated);
Ok(())
}
```
### JavaScript/TypeScript Usage
```javascript
import { NeuralBridge, initialize } from 'cuda-wasm-neural';
// Initialize the system
await initialize();
// Create neural bridge
const bridge = new NeuralBridge();
// Check GPU availability
if (bridge.isGpuAvailable()) {
console.log('GPU acceleration available!');
console.log('Device:', bridge.getDeviceInfo());
}
// Matrix multiplication
const a = new Float32Array([1, 2, 3, 4]);
const b = new Float32Array([5, 6, 7, 8]);
const result = await bridge.matrixMultiply(a, b, 2, 2, 2);
console.log('Result:', result.data);
console.log('Execution time:', result.execution_time, 'ms');
console.log('GPU used:', result.gpu_used);
```
## 🧠 Neural Network Integration
### ruv-FANN Compatibility
```rust
use ruv_fann::{Network, NetworkBuilder};
use cuda_rust_wasm::{NeuralBridge, NeuralOperation};
// Create ruv-FANN network
let mut network: Network<f32> = NetworkBuilder::new()
.input_layer(4)
.hidden_layer(8)
.output_layer(2)
.build();
network.randomize_weights(-1.0, 1.0);
// CPU inference (ruv-FANN)
let input = vec![0.5, 0.8, 0.2, 0.9];
let cpu_output = network.run(&input);
// GPU-accelerated inference (CUDA-WASM)
let bridge = NeuralBridge::new()?;
let gpu_operation = NeuralOperation::ForwardPropagation {
layer_sizes: vec![4, 8, 2],
};
let gpu_output = bridge.execute_neural_operation(gpu_operation, &input)?;
println!("CPU output: {:?}", cpu_output);
println!("GPU output: {:?}", gpu_output);
```
### Supported Operations
| Matrix Multiply | Dense layer computation | ✅ | 8-15x |
| Vector Add | Element-wise addition | ✅ | 3-6x |
| Activation Functions | ReLU, Sigmoid, Tanh, GELU, Swish | ✅ | 5-12x |
| Forward Propagation | Full network inference | ✅ | 6-20x |
| Backward Propagation | Training gradients | ✅ | 4-10x |
| Convolution | 2D convolution layers | ✅ | 10-25x |
| Batch Processing | Multiple operations | ✅ | 2-8x |
| Custom Kernels | User-defined CUDA code | ✅ | Variable |
## ⚡ Performance Optimizations
### Automatic Optimization Features
- **Memory Pooling**: Reuse GPU buffers to reduce allocation overhead
- **Batch Processing**: Process multiple operations efficiently
- **Pipeline Optimization**: Overlap computation and data transfer
- **Kernel Fusion**: Combine operations to reduce GPU kernel launches
- **Precision Selection**: Choose optimal precision (FP16/FP32/FP64)
- **Workgroup Sizing**: Automatic GPU workgroup optimization
### Configuration Options
```rust
use cuda_rust_wasm::{BridgeConfig, GpuDevice, Precision};
let config = BridgeConfig {
enable_gpu: true,
gpu_device: GpuDevice::HighPerformance,
memory_pool_size: 1024, // 1GB memory pool
enable_monitoring: true,
auto_fallback: true,
batch_size: 64,
precision: Precision::Float32,
};
let bridge = NeuralBridge::with_config(config)?;
```
## 🔧 Custom CUDA Kernels
Write custom CUDA kernels that are automatically transpiled to WebGPU:
```rust
let polynomial_kernel = r#"
__global__ void polynomial_eval(float* x, float* y, float a, float b, float c, float d, int size) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < size) {
float val = x[idx];
y[idx] = a * val * val * val + b * val * val + c * val + d;
}
}
"#;
let input_data = vec![-2.0, -1.0, 0.0, 1.0, 2.0];
let operation = NeuralOperation::Custom {
kernel_source: polynomial_kernel.to_string(),
name: "polynomial_eval".to_string(),
};
let result = bridge.execute_neural_operation(operation, &input_data)?;
```
## 📊 Performance Monitoring
### Real-time Metrics
```rust
// Get performance statistics
let perf_stats = bridge.get_performance_stats();
println!("Total operations: {}", perf_stats.total_operations);
println!("Average execution time: {:.2}ms", perf_stats.average_execution_time * 1000.0);
println!("GPU utilization: {:.1}%", perf_stats.gpu_utilization * 100.0);
println!("Throughput: {:.1} ops/sec", perf_stats.throughput);
// Memory usage
let memory_stats = bridge.get_memory_stats();
println!("GPU memory: {:.1} MB", memory_stats.gpu_allocated as f64 / 1024.0 / 1024.0);
println!("CPU memory: {:.1} MB", memory_stats.cpu_allocated as f64 / 1024.0 / 1024.0);
```
### Performance Profiling
```javascript
import { PerformanceProfiler } from 'cuda-wasm-neural';
const profiler = new PerformanceProfiler(bridge);
profiler.startProfiling();
// Perform operations...
await bridge.matrixMultiply(a, b, 1024, 1024, 1024);
const results = profiler.stopProfiling();
console.log('GPU time:', results.gpu_time);
console.log('Memory transfer time:', results.memory_transfer_time);
console.log('Bottlenecks:', results.bottlenecks);
```
## 🎯 Batch Processing
Process multiple operations efficiently:
```rust
let batch_processor = bridge.create_batch_processor();
let operations = vec![
NeuralOperation::VectorAdd { size: 1000 },
NeuralOperation::ActivationFunction {
function: ActivationFunction::ReLU,
size: 1000
},
NeuralOperation::ActivationFunction {
function: ActivationFunction::Sigmoid,
size: 1000
},
];
let inputs = vec![
generate_vector_add_input(1000),
generate_random_input(1000),
generate_random_input(1000),
];
let results = batch_processor.process_batch(operations, inputs)?;
println!("Processed {} operations in batch", results.len());
```
## 🌐 Web Usage
### HTML Integration
```html
<!DOCTYPE html>
<html>
<head>
<script type="module">
import { NeuralBridge, initialize } from './cuda-wasm-neural.js';
async function main() {
await initialize();
const bridge = new NeuralBridge();
console.log('GPU available:', bridge.isGpuAvailable());
}
main();
</script>
</head>
<body>
<h1>GPU-Accelerated Neural Networks</h1>
</body>
</html>
```
### Web Worker Support
```javascript
// main.js
import { NeuralWorker } from 'cuda-wasm-neural';
const worker = new NeuralWorker();
const result = await worker.execute('matrix_multiply', data);
console.log('Result from worker:', result);
```
## 🔍 Debugging and Troubleshooting
### Common Issues
1. **GPU Not Available**
```javascript
if (!bridge.isGpuAvailable()) {
console.log('GPU not available, using CPU fallback');
}
```
2. **Memory Issues**
```rust
let memory_stats = bridge.get_memory_stats();
if memory_stats.gpu_allocated > threshold {
}
```
3. **Performance Degradation**
```rust
if let Some(degradation) = bridge.detect_degradation() {
println!("Performance issue: {}", degradation.suggested_action);
}
```
### Debug Features
```rust
// Enable detailed logging
let config = BridgeConfig {
enable_monitoring: true,
// ... other options
};
// Use debug build for detailed error messages
#[cfg(debug_assertions)]
let bridge = NeuralBridge::with_config(config)?;
```
## 📈 Benchmarks
### Performance Results
| Matrix Multiply | 512x512 | 45.2ms | 6.1ms | 7.4x |
| Matrix Multiply | 1024x1024 | 362ms | 24.5ms | 14.8x |
| Vector Add | 1M elements | 12.3ms | 2.1ms | 5.9x |
| ReLU Activation | 1M elements | 8.7ms | 1.2ms | 7.3x |
| Forward Prop | 784→1000→10 | 15.6ms | 2.8ms | 5.6x |
### Running Benchmarks
```rust
use cuda_rust_wasm::neural_integration::benchmarks;
// Run comprehensive benchmarks
let mut benchmark_suite = benchmarks::BenchmarkSuite::new()?;
benchmark_suite.run_comprehensive_benchmarks()?;
// Export results to CSV
benchmark_suite.export_csv("benchmark_results.csv")?;
```
## 🛠️ Development
### Building from Source
```bash
# Clone the repository
git clone https://github.com/ruvnet/ruv-FANN.git
cd ruv-FANN/cuda-wasm
# Build with neural integration features
cargo build --features neural-integration
# Run tests
cargo test --features neural-integration
# Build for WASM
wasm-pack build --target web --features wasm,neural-integration
```
### Running Examples
```bash
# Run comprehensive integration example
cargo run --example cuda_wasm_neural_integration --features neural-integration
# Run benchmarks
cargo run --example neural_benchmarks --features neural-integration
# Run web example
cd web-example && npm start
```
## 📚 API Reference
### Core Types
- `NeuralBridge` - Main interface for neural operations
- `BridgeConfig` - Configuration for optimization
- `NeuralOperation<T>` - Operation types (matrix multiply, activation, etc.)
- `BatchProcessor` - Efficient bulk operation processing
- `PerformanceMonitor` - Real-time performance tracking
### Memory Management
- `MemoryManager` - GPU/CPU memory allocation
- `BufferPool` - Efficient buffer reuse
- `TransferCache` - Cache frequently used data
### Error Handling
- `NeuralIntegrationError` - Base error type
- `GpuInitializationError` - GPU setup issues
- `TranspilationError` - CUDA→WGSL conversion errors
- `MemoryError` - Memory allocation failures
## 🤝 Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite
6. Submit a pull request
### Areas for Contribution
- New neural network operations
- Performance optimizations
- Additional GPU backends
- Improved error handling
- Documentation improvements
- Web framework integrations
## 📄 License
This project is licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
- MIT License ([LICENSE-MIT](LICENSE-MIT))
at your option.
## 🙏 Acknowledgments
- **ruv-FANN** - Fast neural network library for Rust
- **WebGPU Working Group** - Web GPU standards
- **CUDA Toolkit** - NVIDIA GPU computing platform
- **wgpu-rs** - Rust WebGPU implementation
- **wasm-bindgen** - Rust ↔ JavaScript bindings
## 📞 Support
- 📖 [Documentation](https://docs.rs/cuda-rust-wasm)
- 🐛 [Issue Tracker](https://github.com/ruvnet/ruv-FANN/issues)
- 💬 [Discussions](https://github.com/ruvnet/ruv-FANN/discussions)
- 📧 [Email Support](mailto:support@ruv-fann.com)
---
**Experience the power of GPU-accelerated neural networks with seamless CUDA-WASM integration!** 🚀