Expand description
RLX wgpu backend — cross-platform GPU execution via the wgpu
Rust crate (Metal on macOS, Vulkan on Linux, DX12 on Windows,
WebGPU in browsers).
Compared to rlx-metal: same overall shape (device singleton, buffer
arena, per-op compute pipelines, command-buffer-per-forward-pass)
but with WGSL kernels and the wgpu Rust API instead of MSL + the
metal crate. Pure Rust deps — no FFI / submodules to manage.
Layout:
device— wgpu instance/adapter/device singleton (sync wrapper)buffer— typed GPU buffer + arenakernels— WGSL source strings + per-kernel pipeline cachebackend— Backend trait impl + per-op dispatch
Re-exports§
pub use device::is_vulkan_available;pub use device::select_vulkan_backend;
Modules§
- backend
WgpuExecutable— compiles an rlx-ir Graph into a sequence of kernel dispatches against a pre-allocated arena buffer.- buffer
- Buffer arena for the wgpu backend. Mirrors the rlx-metal arena
shape: pre-plan one big storage buffer at compile time, sub-allocate
per-node offsets at known positions, treat I/O as
write_buffer/read_bufferagainst those offsets. - calibrate
- On-disk wgpu calibration for cost-model ranking.
- coop_
f16_ vk - device
- wgpu device discovery + capabilities.
- fft_
dispatch - fft_
host - gdn_
host - Host-side
Op::GatedDeltaNetfor wgpu arenas (readback → CPU → writeback). - gguf_
host - Host-side GGUF K-quant
Op::DequantMatMulfor wgpu arenas. - im2col_
host - kernels
- WGSL kernel sources + per-kernel pipeline cache.
- llada2_
gate_ host - Host-side
Op::Custom("llada2.group_limited_gate")for wgpu arenas. - log_
mel_ host - training_
bwd_ host - Host-side training backward ops for wgpu arenas (readback → CPU → writeback).
- umap_
knn_ host - Host-side
Op::Custom("umap.knn")for wgpu arenas (smallnonly). - unfuse
- IR-level “unfusion” pass for the wgpu backend.
- welch_
peaks_ dispatch - welch_
peaks_ host
Functions§
- is_
available - True if a wgpu adapter is reachable on this system. Always
available at the crate level; the runtime registry only registers
the backend when this returns
trueso tests on weird CI machines without a GPU don’t trip up.