rlx-runtime 0.2.1

RLX runtime — feature-gated backends, session API, compile+execute pipeline
Documentation

rlx-runtime

User-facing API for RLX — Session::new(Device).compile(graph)CompiledGraph, which holds the executable, the arena, the weights, and the device handle.

What's here

  • Session — entry point; selects a backend via Device.
  • CompiledGraph (compiled.rs) — run / set_param / set_input. Zero allocation per call.
  • Backend trait + ExecutableGraph — every backend (CPU, Metal, MLX, CUDA, ROCm, wgpu, TPU) implements these. Every backend declares its supported OpKinds, and legalize_for_backend rejects unsupported graphs at compile time.
  • registry.rs / op_registry.rs — backend factory + per-op registration plumbing for downstream extension.
  • Device lives in rlx-driver::device; this crate just consumes it. Variants: Cpu, Metal, Mlx, Ane, Cuda, Rocm, Tpu, Gpu (wgpu), Vulkan, OpenGl, DirectX, WebGpu.
  • device_ext.rsDevice::is_available() lookup against the registry (keeps the runtime→driver dep direction one-way).
  • weights.rsWeightLoader trait + BytesWeightLoader. Promote to registry per plan #24 / #56.
  • arena.rs — device-side arena buffer.
  • CompileCache (compile_cache.rs) — graph-fingerprint → compiled-artifact cache.
  • subgraph.rsrun_if / run_while helpers; the IR has If/While ops but executor wiring is pending (see Op::If/While docstring).
  • PrecisionPolicy — re-export from rlx-opt. AMP / always-f16 / always-f32 / always-bf16.
  • trace.rs — runtime tracing (verbose env-gated).
  • cost.rs — heterogeneous cost model that picks Cpu vs. Metal vs. MLX per graph.
  • stream.rs — async command stream (Metal-side; CPU is sync).
  • paged_kv — paged KV cache + continuous batching primitives.

Re-exports: Tick, time_ns from rlx_ir::measure. Use these for any sub-ms timing in the user-facing layer.

Cargo features

feature backend
cpu (default) rlx-cpu
metal rlx-metal (macOS)
mlx rlx-mlx (macOS)
gpu rlx-wgpu (cross-platform)
cuda rlx-cuda
rocm rlx-rocm
tpu rlx-tpu
blas-accelerate macOS Accelerate
blas-mkl Intel MKL
blas-openblas OpenBLAS

Install

[dependencies]
rlx-runtime = { version = "0.1", features = ["cpu"] }

Heads-up. The mlx and rocm features pull in rlx-mlx and rlx-rocm, which aren't on crates.io for 0.1.0 (workspace- relative submodule / kernel-source paths). Enabling those features on a crates.io build of rlx-runtime will fail to resolve. Use a git source on the whole workspace instead:

rlx-runtime = { git = "https://github.com/MIT-RLX/rlx", features = ["mlx"] }

Most users want the rlx prelude crate; it re-exports rlx_runtime::Session and friends at the top level.

Quickstart

use rlx_ir::{DType, Graph, Shape};
use rlx_runtime::{Device, Session};

let mut g = Graph::new("hello");
let x = g.input("x", Shape::new(&[1, 4], DType::F32));
let w = g.param("w", Shape::new(&[4, 2], DType::F32));
let y = g.matmul(x, w, Shape::new(&[1, 2], DType::F32));
g.set_outputs(vec![y]);

let mut compiled = Session::new(Device::Cpu).compile(g);
compiled.set_param("w", &[1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0]);
let out = compiled.run(&[("x", &[1.0, 2.0, 3.0, 4.0])]);

Build / test

cargo build -p rlx-runtime --features cpu                       # CPU only
cargo build -p rlx-runtime --features cpu,metal                 # +Metal
cargo test  -p rlx-runtime --release

Gotchas

  • Backend selection is feature-gated. --features metal is mandatory to instantiate Device::Metal; otherwise Session::new(Metal) panics at registry lookup. Same applies to cuda, rocm, mlx, wgpu.
  • set_param accepts &[f32] of the declared shape's element count. Mismatched len is a runtime panic, not a compile-time error.
  • Compile cache key includes the graph fingerprint and the precision policy — bumping precision invalidates entries.
  • For long-running serving paths, prefer CompileCache over recompiling per request.

License

GPL-3.0-only.