amari-gpu
GPU acceleration for Amari mathematical computations using WebGPU.
Overview
amari-gpu is an integration crate that provides GPU-accelerated implementations of mathematical operations from Amari domain crates. It follows the progressive enhancement pattern: operations automatically fall back to CPU computation when GPU is unavailable or for small workloads, scaling to GPU acceleration for large batch operations in production.
Architecture
As an integration crate, amari-gpu consumes APIs from domain crates and exposes them to GPU platforms:
Domain Crates (provide APIs):
amari-core → amari-measure → amari-calculus
amari-info-geom, amari-relativistic, amari-network
Integration Crates (consume APIs):
amari-gpu → depends on domain crates
amari-wasm → depends on domain crates
Dependency Rule: Integration crates depend on domain crates, never the reverse.
Current Integrations (v0.11.1)
Implemented GPU Acceleration
| Domain Crate | Module | Operations | Status |
|---|---|---|---|
| amari-core | core |
Geometric algebra operations (G2, G3, G4), multivector products | ✅ Implemented |
| amari-info-geom | info_geom |
Fisher metric, divergence computations, statistical manifolds | ✅ Implemented |
| amari-relativistic | relativistic |
Minkowski space operations, Lorentz transformations | ✅ Implemented |
| amari-network | network |
Graph operations, spectral methods | ✅ Implemented |
| amari-measure | measure |
Measure theory computations, sigma-algebras | ✅ Implemented (feature: measure) |
| amari-calculus | calculus |
Field evaluation, gradients, divergence, curl | ✅ Implemented (feature: calculus) |
Placeholder Modules (Future Work)
| Domain Crate | Module | Status |
|---|---|---|
| amari-tropical | tropical |
⏸️ Placeholder shaders only |
| amari-dual | dual |
⏸️ Placeholder shaders only |
| amari-fusion | fusion |
⏸️ Placeholder shaders only |
| amari-automata | automata |
⏸️ Placeholder shaders only |
| amari-enumerative | enumerative |
⏸️ Placeholder shaders only |
Features
[]
= []
= ["amari-core/std", "amari-relativistic/std", "amari-info-geom/std"]
= ["wgpu/webgpu"]
= ["amari-core/high-precision", "amari-relativistic/high-precision"]
= ["dep:amari-measure"]
= ["dep:amari-calculus"]
Usage
Basic Setup
use GpuContext;
async
Calculus GPU Acceleration
use GpuCalculus;
use ScalarField;
use Multivector;
async
Adaptive CPU/GPU Dispatch
The library automatically selects the optimal execution path:
// Small batch: Automatically uses CPU (< 1000 points for scalar fields)
let small_points = vec!;
let values = gpu_calculus.batch_eval_scalar_field.await?;
// ↑ Executed on CPU (overhead of GPU transfer exceeds benefit)
// Large batch: Automatically uses GPU (≥ 1000 points)
let large_points = generate_point_grid; // 10,000 points
let values = gpu_calculus.batch_eval_scalar_field.await?;
// ↑ Executed on GPU (parallel processing advantage)
Batch Size Thresholds
| Operation | CPU Threshold | GPU Threshold |
|---|---|---|
| Scalar field evaluation | < 1000 points | ≥ 1000 points |
| Vector field evaluation | < 500 points | ≥ 500 points |
| Gradient computation | < 500 points | ≥ 500 points |
| Divergence/Curl | < 500 points | ≥ 500 points |
Implementation Status
Calculus Module (v0.11.1)
CPU Implementations (✅ Complete):
- Central finite differences for numerical derivatives
- Field evaluation at multiple points
- Gradient, divergence, and curl computation
- Step size: h = 1e-6 for numerical stability
GPU Implementations (⏸️ Future Work):
- WGSL compute shaders for parallel field evaluation
- Parallel finite difference computation
- Optimized memory layout for GPU transfer
Current Behavior:
- Infrastructure and pipelines are in place
- All operations currently use CPU implementations
- Shaders can be added incrementally without API changes
Examples
See the examples/ directory for complete examples:
# Run geometric algebra example
# Run information geometry example
# Run calculus example (requires 'calculus' feature)
Development
Running Tests
# Run all tests
# Run with specific features
# Run GPU tests (requires GPU access)
Building Documentation
Future Work
Short-term (v0.12.x)
- Implement WGSL shaders for calculus operations
- Add GPU benchmarks comparing CPU vs GPU performance
- Optimize memory transfer patterns
- Add more comprehensive examples
Medium-term (v0.13.x - v0.14.x)
- Implement tropical algebra GPU operations
- Add dual number GPU acceleration
- Implement fusion algebra operations
- Add automata GPU acceleration
Long-term (v1.0.0+)
- WebGPU backend for browser deployment
- Multi-GPU support for distributed computation
- Kernel fusion optimization
- Custom WGSL shader compilation pipeline
Performance Considerations
- GPU Initialization: ~100-200ms startup cost for context creation
- Data Transfer: Significant overhead for small batches (< 500 elements)
- Optimal Use Cases: Large batch operations (> 1000 elements)
- Memory: GPU buffers are sized for batch operations (dynamically allocated)
Platform Support
| Platform | Backend | Status |
|---|---|---|
| Linux | Vulkan | ✅ Tested |
| macOS | Metal | ✅ Supported (not regularly tested) |
| Windows | DirectX 12 / Vulkan | ✅ Supported (not regularly tested) |
| WebAssembly | WebGPU | ⏸️ Requires webgpu feature |
Dependencies
wgpu(v0.19): WebGPU implementationbytemuck: Zero-cost GPU buffer conversionsnalgebra: Linear algebra operationstokio: Async runtime for GPU operationsfutures,pollster: Async utilities
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
Contributing
Contributions are welcome! Areas of particular interest:
- WGSL shader implementations for calculus operations
- Performance benchmarks and optimization
- Platform-specific testing and bug reports
- Documentation improvements and examples
References
- WebGPU Specification
- wgpu Documentation
- Geometric Algebra GPU Acceleration (example reference)