Expand description
RLX Intel oneAPI backend — the dedicated Device::OneApi path for Intel
Arc / Data Center Max GPUs via the Level Zero runtime.
Layout mirrors the other native GPU backends (rlx-cuda / rlx-vulkan):
level_zero— hand-rolledze_*FFI, dynamic-loaded (libze_loader); builds + links with no oneAPI runtime present (macOS / CI).device— driver/device/context/compute-queue singleton; gracefully unavailable when no Level Zero GPU is reachable.kernels— embedded OpenCL-C→SPIR-V blobs (build.rsviaocloc) + theirze_module/ze_kernelcache.arena— f32-uniform USM-shared buffer for the native dispatch path.host—rlx-cpureference eval (whole-graph on the dev box; per-op fallback on hardware).backend—OneApiExecutable: legalize → run (native dispatch when a device + kernels exist, else the CPU-reference interpreter).
§Status
The Level Zero bring-up, SPIR-V module/kernel wiring, USM arena, and per-op
dispatch are implemented but not yet validated on Intel hardware — there
is none on the dev box. Until then every op is served by the bit-exact
rlx-cpu reference, so the backend is correct everywhere and native on
Arc / Data Center Max pending bring-up. See README.md for the validation
plan and the SPIR-V flavor (OpenCL-Kernel vs Vulkan-Shader) rationale.
Modules§
- arena
- The f32-uniform USM-shared GPU arena for the Level Zero dispatch path. Like
rlx-vulkan’s host-visible arena, every tensor is an f32 slot at a byte
offset in one contiguous buffer; here the buffer is a single
zeMemAllocSharedallocation, which is CPU-dereferenceable on Intel’s shared-memory GPUs, so host upload/readback and the CPU host-fallback are plain pointer writes with no staging. Only constructed when a live device is present (the dev-box path uses the value-map interpreter inbackend.rs). - backend
OneApiExecutable— compile an IR graph for the Intel oneAPI Level Zero backend and execute it.- device
- The process-wide Level Zero driver / device / context / compute-queue
singleton, brought up through the dynamically-loaded
libze_loader. If no loader is present,zeInitfails, or no GPU device is exposed,oneapi_devicereturnsNoneand the backend reports itself unavailable — mirroring rlx-cuda / rlx-rocm / rlx-vulkan on a host with no driver. - host
- Single-op CPU reference evaluation via
rlx-cpu’s thunk executor (the same kernels the CPU backend uses, so results are bit-for-bit the reference). - kernels
- Embedded SPIR-V kernel blobs (compiled from
kernels/*.clbybuild.rs) and their Level Zero module/kernel cache.SPIRV_BLOBSis empty unless the crate was built on an Intel oneAPI host withocloc+RLX_ONEAPI_BUILD_KERNELS=1, in which casekernelsbuilds oneze_module+ze_kernelper blob (the kernel name equals the.clfunction name, which equals the file stem). - level_
zero - Hand-rolled Level Zero (
ze_*) FFI, dynamically loaded withlibloading.
Functions§
- device_
name - Human-readable name of the selected Intel device, if any.
- has_
native_ kernels - Whether native SPIR-V kernels were embedded for this build (Intel oneAPI
build host with
ocloc+RLX_ONEAPI_BUILD_KERNELS=1). Whenfalse, the native path serves every op through the CPU reference. - is_
available - True if a Level Zero GPU device is reachable on this system. The runtime
registry only registers
Device::OneApiwhen this returnstrue, so hosts with no oneAPI runtime (macOS, CI) fall through cleanly.