Expand description
RLX native Vulkan compute backend.
A from-scratch Vulkan compute backend built directly on ash (raw
Vulkan) with hand-written GLSL compute kernels compiled to SPIR-V at
build time and embedded in the binary. Unlike rlx-wgpu (which can
reach Vulkan via the wgpu portability layer), this backend owns the
Vulkan instance/device/queue, its own arena VkBuffer, descriptor
sets, and compute pipelines — the dedicated Device::Vulkan path.
Layout mirrors the other native GPU backends (rlx-cuda / rlx-rocm):
device— Vulkan instance/physical-device/device/queue singleton (dynamic-loaded; gracefully unavailable with no driver)shaders— embedded SPIR-V blobs (built fromshaders/*.comp)kernels— per-kernel compute-pipeline cachebuffer— host-visible f32 arena + memory plan mappingbackend—VulkanExecutable: compile a graph → schedule → run
Modules§
- backend
VulkanExecutable— compile an IR graph into a flat schedule of compute dispatches over a single f32 arena buffer, then execute it.- buffer
- The f32-uniform GPU arena. Like rlx-cuda / rlx-wgpu, every tensor is an
f32 slot at a byte offset in one contiguous buffer. We allocate the
arena as
HOST_VISIBLE | HOST_COHERENTmemory and keep it persistently mapped, so host upload/readback is a plainmemcpywith no staging buffer or transfer command. (On discrete GPUs aDEVICE_LOCALarena + staging would have higher bandwidth — a documented follow-up; correctness first.) - device
- Vulkan instance / physical-device / logical-device / compute-queue
singleton, brought up through
ashwith the dynamically-loaded Vulkan loader. If no loader / driver is present (Entry::load()fails) or no device exposes a compute queue,vulkan_devicereturnsNoneand the whole backend reports itself unavailable — the crate still compiles and links on hosts without Vulkan (macOS without MoltenVK, CI). - host
- CPU host-fallback for ops that have no native SPIR-V kernel yet (the
sequential / specialized families: RNN, Mamba2, GatedDeltaNet,
ConvTranspose2d, FFT). Each fallback builds a one-op CPU graph, runs it
through
rlx-cpu’s thunk executor (the same kernels the CPU backend uses, so results are bit-for-bit the reference), and returns the f32 output. - kernels
- Per-kernel compute-pipeline cache.
- shaders
- Embedded SPIR-V compute kernels, compiled from
shaders/*.compbybuild.rs(naga GLSL → SPIR-V). Each blob is the raw little-endian SPIR-V word stream for one@computeentry point namedmain.
Functions§
- device_
name - Human-readable name of the selected Vulkan physical device, if any.
- is_
available - True if a Vulkan compute device is reachable on this system. The
runtime registry only registers
Device::Vulkanwhen this returnstrue, so hosts with no Vulkan driver (e.g. macOS without MoltenVK) fall through cleanly instead of panicking.