Crate rlx_oneapi

Expand description

RLX Intel oneAPI backend — the dedicated Device::OneApi path for Intel Arc / Data Center Max GPUs via the Level Zero runtime.

Layout mirrors the other native GPU backends (rlx-cuda / rlx-vulkan):

level_zero — hand-rolled ze_* FFI, dynamic-loaded (libze_loader); builds + links with no oneAPI runtime present (macOS / CI).
device — driver/device/context/compute-queue singleton; gracefully unavailable when no Level Zero GPU is reachable.
kernels — embedded OpenCL-C→SPIR-V blobs (build.rs via ocloc) + their ze_module/ze_kernel cache.
arena — f32-uniform USM-shared buffer for the native dispatch path.
host — rlx-cpu reference eval (whole-graph on the dev box; per-op fallback on hardware).
backend — OneApiExecutable: legalize → run (native dispatch when a device + kernels exist, else the CPU-reference interpreter).

§Status

The Level Zero bring-up, SPIR-V module/kernel wiring, USM arena, and per-op dispatch are implemented but not yet validated on Intel hardware — there is none on the dev box. Until then every op is served by the bit-exact rlx-cpu reference, so the backend is correct everywhere and native on Arc / Data Center Max pending bring-up. See README.md for the validation plan and the SPIR-V flavor (OpenCL-Kernel vs Vulkan-Shader) rationale.

Modules§

arena: The f32-uniform USM-shared GPU arena for the Level Zero dispatch path. Like rlx-vulkan’s host-visible arena, every tensor is an f32 slot at a byte offset in one contiguous buffer; here the buffer is a single zeMemAllocShared allocation, which is CPU-dereferenceable on Intel’s shared-memory GPUs, so host upload/readback and the CPU host-fallback are plain pointer writes with no staging. Only constructed when a live device is present (the dev-box path uses the value-map interpreter in backend.rs).
backend: OneApiExecutable — compile an IR graph for the Intel oneAPI Level Zero backend and execute it.
device: The process-wide Level Zero driver / device / context / compute-queue singleton, brought up through the dynamically-loaded libze_loader. If no loader is present, zeInit fails, or no GPU device is exposed, oneapi_device returns None and the backend reports itself unavailable — mirroring rlx-cuda / rlx-rocm / rlx-vulkan on a host with no driver.
host: Single-op CPU reference evaluation via rlx-cpu’s thunk executor (the same kernels the CPU backend uses, so results are bit-for-bit the reference).
kernels: Embedded SPIR-V kernel blobs (compiled from kernels/*.cl by build.rs) and their Level Zero module/kernel cache. SPIRV_BLOBS is empty unless the crate was built on an Intel oneAPI host with ocloc + RLX_ONEAPI_BUILD_KERNELS=1, in which case kernels builds one ze_module + ze_kernel per blob (the kernel name equals the .cl function name, which equals the file stem).
level_zero: Hand-rolled Level Zero (ze_*) FFI, dynamically loaded with libloading.

Functions§

device_name: Human-readable name of the selected Intel device, if any.
has_native_kernels: Whether native SPIR-V kernels were embedded for this build (Intel oneAPI build host with ocloc + RLX_ONEAPI_BUILD_KERNELS=1). When false, the native path serves every op through the CPU reference.
is_available: True if a Level Zero GPU device is reachable on this system. The runtime registry only registers Device::OneApi when this returns true, so hosts with no oneAPI runtime (macOS, CI) fall through cleanly.

Crate rlx_oneapi

Crate rlx_oneapi Copy item path

§Status

Modules§

Functions§

Crate rlx_oneapi