runmat-accelerate 0.5.0

RunMat Accelerate: GPU Acceleration Abstraction Layer

Goals:

Provide a backend-agnostic API surface that maps RunMat operations to GPU kernels.
Support multiple backends via features (CUDA, ROCm, Metal, Vulkan, OpenCL, wgpu).
Allow zero-copy interop with runmat-builtins::Matrix where possible.
Defer actual kernel authoring to backend crates/modules; this crate defines traits and wiring.