torsh-backends
Unified backend implementation for ToRSh with PyTorch-compatible API, leveraging SciRS2's GPU acceleration.
Overview
This crate provides a unified backend system that integrates with SciRS2's compute backends:
- CPU Backend: Optimized CPU operations with SIMD and parallelism
- CUDA Backend: NVIDIA GPU acceleration via scirs2-core's CUDA support
- Metal Backend: Apple GPU acceleration via scirs2-core's Metal/MPS support
- ROCm Backend: AMD GPU acceleration (via scirs2-core when available)
- WebGPU Backend: Cross-platform GPU support (via scirs2-core when available)
Note: All backend implementations are unified in this single crate using feature flags, eliminating the need for separate torsh-backend-* crates.
Architecture
The backend system leverages SciRS2's GPU infrastructure:
use ;
// Unified backend with runtime selection
let backend = new?; // Auto-detect best backend
let backend = new?; // Explicit CUDA
let backend = new?; // Explicit Metal
Feature Flags
[]
= { = "0.1.0", = ["cuda", "metal"] }
# Available features:
# - "cpu" (default): CPU backend with SIMD optimizations
# - "cuda": NVIDIA GPU backend via scirs2-core
# - "metal": Apple GPU backend via scirs2-core
# - "rocm": AMD GPU backend via scirs2-core
# - "webgpu": WebGPU backend via scirs2-core
Usage
Unified Backend API
use *;
// Automatic backend selection based on availability
let backend = auto?;
// Query available backends
for backend_type in available
// Create backend with specific configuration
let backend = new
.backend_type
.device_id
.memory_pool_size // 4GB
.enable_tensor_cores
.build?;
// All backends use the same API
let a = backend.randn?;
let b = backend.randn?;
let c = backend.matmul?;
CPU Backend
// CPU backend leverages scirs2-core's optimized operations
let cpu_backend = cpu
.num_threads
.enable_simd
.build?;
// Uses OpenBLAS/MKL/Accelerate via scirs2
let result = cpu_backend.gemm?;
CUDA Backend
Metal Backend
Unified Operations
// All backends support the same operations via scirs2
Device Management
// Unified device abstraction
let devices = list_devices?;
for device in devices
// Multi-device support
let backend_gpu0 = new?.device?;
let backend_gpu1 = new?.device?;
// Device synchronization
backend.synchronize?;
Memory Management
// Unified memory pool leveraging scirs2's allocators
let backend = new?
.memory_pool?;
// Zero-copy host-device transfers (when supported)
let pinned = backend.alloc_pinned?;
backend.copy_host_to_device_async.await?;
Performance Features
// Auto-tuning (via scirs2's auto-tuning infrastructure)
let backend = new?
.enable_autotuning
.autotuning_cache_file?;
// Mixed precision training
let backend = backend.enable_mixed_precision?;
// Graph optimization (when using CUDA)
let graph = backend.capture_graph?;
let result = backend.launch_graph?;
Integration with SciRS2
This crate fully leverages SciRS2's backend infrastructure:
Backend Implementation Status
| Backend | SciRS2 Integration | Features |
|---|---|---|
| CPU | ✅ scirs2-core (default) | OpenBLAS/MKL/Accelerate, SIMD, Rayon parallelism |
| CUDA | ✅ scirs2-core with cuda |
Optimized kernels, cuDNN, Tensor Cores, Streams |
| Metal | ✅ scirs2-core with metal |
Metal Performance Shaders, Unified memory |
| ROCm | 🚧 scirs2-core with rocm |
Available when scirs2 implements |
| WebGPU | 🚧 scirs2-core with wgpu |
Available when scirs2 implements |
Leveraged SciRS2 Features
- GPU Kernels: All GPU operations use scirs2's optimized kernels
- Auto-tuning: Kernel selection via scirs2's auto-tuning
- Memory Management: Efficient pooling from scirs2
- Async Execution: Built on scirs2's async GPU model
- BLAS/LAPACK: CPU ops via scirs2's math libraries
Migration from Separate Backend Crates
The previous separate backend crates (torsh-backend-cpu, torsh-backend-cuda, torsh-backend-metal) are now deprecated. Use feature flags instead:
# Old (deprecated)
= "0.1.0"
# New (unified)
= { = "0.1.0", = ["cuda"] }
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.