RunMat Accelerate: GPU Acceleration Abstraction Layer
Goals:
- Provide a backend-agnostic API surface that maps RunMat operations to GPU kernels.
- Support multiple backends via features (CUDA, ROCm, Metal, Vulkan, OpenCL, wgpu).
- Allow zero-copy interop with
runmat-builtins::Matrixwhere possible. - Defer actual kernel authoring to backend crates/modules; this crate defines traits and wiring.