Skip to main content

Crate rlx_vulkan

Crate rlx_vulkan 

Source
Expand description

RLX native Vulkan compute backend.

A from-scratch Vulkan compute backend built directly on ash (raw Vulkan) with hand-written GLSL compute kernels compiled to SPIR-V at build time and embedded in the binary. Unlike rlx-wgpu (which can reach Vulkan via the wgpu portability layer), this backend owns the Vulkan instance/device/queue, its own arena VkBuffer, descriptor sets, and compute pipelines — the dedicated Device::Vulkan path.

Layout mirrors the other native GPU backends (rlx-cuda / rlx-rocm):

  • device — Vulkan instance/physical-device/device/queue singleton (dynamic-loaded; gracefully unavailable with no driver)
  • shaders — embedded SPIR-V blobs (built from shaders/*.comp)
  • kernels — per-kernel compute-pipeline cache
  • buffer — host-visible f32 arena + memory plan mapping
  • backendVulkanExecutable: compile a graph → schedule → run

Modules§

backend
VulkanExecutable — compile an IR graph into a flat schedule of compute dispatches over a single f32 arena buffer, then execute it.
buffer
The f32-uniform GPU arena. Like rlx-cuda / rlx-wgpu, every tensor is an f32 slot at a byte offset in one contiguous buffer. We allocate the arena as HOST_VISIBLE | HOST_COHERENT memory and keep it persistently mapped, so host upload/readback is a plain memcpy with no staging buffer or transfer command. (On discrete GPUs a DEVICE_LOCAL arena + staging would have higher bandwidth — a documented follow-up; correctness first.)
device
Vulkan instance / physical-device / logical-device / compute-queue singleton, brought up through ash with the dynamically-loaded Vulkan loader. If no loader / driver is present (Entry::load() fails) or no device exposes a compute queue, vulkan_device returns None and the whole backend reports itself unavailable — the crate still compiles and links on hosts without Vulkan (macOS without MoltenVK, CI).
host
CPU host-fallback for ops that have no native SPIR-V kernel yet (the sequential / specialized families: RNN, Mamba2, GatedDeltaNet, ConvTranspose2d, FFT). Each fallback builds a one-op CPU graph, runs it through rlx-cpu’s thunk executor (the same kernels the CPU backend uses, so results are bit-for-bit the reference), and returns the f32 output.
kernels
Per-kernel compute-pipeline cache.
shaders
Embedded SPIR-V compute kernels, compiled from shaders/*.comp by build.rs (naga GLSL → SPIR-V). Each blob is the raw little-endian SPIR-V word stream for one @compute entry point named main.

Functions§

device_name
Human-readable name of the selected Vulkan physical device, if any.
is_available
True if a Vulkan compute device is reachable on this system. The runtime registry only registers Device::Vulkan when this returns true, so hosts with no Vulkan driver (e.g. macOS without MoltenVK) fall through cleanly instead of panicking.