Expand description
RLX CPU backend — executes optimized IR graphs on CPU.
Takes a fused + memory-planned IR graph and executes it using:
- BLAS (Accelerate/MKL/OpenBLAS) for matmul
- NEON/AVX SIMD kernels for element-wise ops
- Persistent Rayon thread pool for parallelism
- Arena allocator for zero per-call allocation
Modules§
- arena
- Arena allocator — ONE allocation, zero per-call overhead.
- asm_
check - FileCheck-style disassembly regression tests (plan #10).
- attention_
bwd - Scaled dot-product attention backward (recomputes scores + softmax).
- autotune
- Auto-tuner — finds the optimal RuntimeConfig for a model on current hardware.
- blas
- Direct BLAS FFI — zero abstraction overhead.
- calibrate
- Activation-scale calibration for post-training INT8 quantization.
- config
- Runtime configuration — compile-time platform defaults + runtime hardware detection.
- cost
- Cost model — estimates execution time for kernel dispatch decisions.
- dequant_
cache - Cache dequantized GGUF weight bytes for static params.
- dispatch
- Dispatch table — calibration-aware kernel selection (plan #2).
- executor
- Graph executor — runs a fused IR graph on CPU using the arena + kernels.
- gdn
- Gated-DeltaNet BLAS micro-kernels (Tier C.10).
- gguf_
matmul - Fused GGUF K-quant dequant + matmul without materializing full F32 weights (Tier C.11).
- intrinsics
- ISA-split intrinsics layer (plan #85).
- kernel_
config - Compile-time kernel-config tables (plan #14).
- kernels
- SIMD kernels for fused operations.
- llada2_
gate - lm_head
- Greedy tied-LM-head argmax without materializing full vocab logits.
- moe_
residency - Per-forward MoE expert residency mask (TIDE placement) for CPU dispatch.
- moe_
topk_ capture - Capture MoE router [
Op::TopK] outputs during CPU forward (TIDE refresh input). - naive
- Naive reference implementations — for correctness testing and benchmarking.
- op_
registry - Per-backend (CPU) kernel registry for
Op::Custom. - pool
- Rayon-backed parallel for:
par_for(total, grain, |off, cnt| …). - splat
- CPU dispatch hooks for
rlx_ir::Op::GaussianSplatRender— bodies registered fromrlx-splat. - thunk
- Thunks — pre-compiled kernel dispatch with zero per-call overhead.
- tile
- CPU
TileIOimpls (plans #23 + #27). - training_
bwd - umap_
knn - Reference k-NN from a row-major
[n, n]pairwise distance matrix.