Expand description
RLX model loading — parse configs, load weights, build IR graphs.
This crate is a thin facade over per-model workspace members (rlx-qwen3,
rlx-sam, …). Depend on a specific model crate directly when you only need
one family.
Re-exports§
pub use run::ConfigSource;pub use run::SamArch;pub use run::SamPredictionAny;pub use run::SamRunner;pub use run::SamRunnerBuilder;
Modules§
- arch_
registry - Architecture registry (plan #82).
- bert
- bert_
flow - bonsai
- cohere
- config
- Model configuration structs — parsed from HuggingFace config.json.
- dataprocessing
- Reusable batch-prep utilities (plan #83).
- diamond
- dinov2
- embed
- flow_
bridge - Bridge between
rlx-modelsloaders/runtime andrlx-flow. - flow_
util - Shared helpers for tier-0 model flow migration.
- flux2
- gemma
- gguf_
resolve - Pluggable GGUF tensor-name resolution per
general.architecture. - gguf_
support - Shared GGUF helpers for LM runners (architecture checks, path resolution).
- granite
- llada2
- llama32
- lm
- Shared causal-LM flow helpers — re-export tier-0 surface for model authors.
- mask_
hyper_ matmul_ ir - mask_
prompt_ ir - mistral
- mlp_
relu_ ir - neutts
- nomic
- nomic_
flow - ocr
- ocrs
Deprecated - omnicoder
- phi
- qwen3
- qwen35
- qwen35_
synth - Synthetic configs/weights for tests and
qwen35_inferencebench. Synthetic Qwen3.5 weights for integration tests and criterion benches. - run
- High-level runner API — re-exported from per-model crates.
- sam
- sam2
- sam3
- tide
- twoway_
transformer_ ir - vision
- vision_
flow - vision_
ops_ ir - Shared HIR builders for NCHW vision ops (
Conv,ConvTranspose2d,LayerNorm2d, bias broadcast). Used by SAM / SAM2 / SAM3. - vjepa2
- wav2vec2_
bert - weight_
loader - Pluggable weight loader trait (plan #56).
- weight_
map - Safetensors weight loading — standalone, no framework dependency.
- weight_
registry - Extensible weight-format registry — register custom loaders for new extensions.
- weights
- Model-agnostic weight I/O — paths, formats, drain policy only.
- whisper
Structs§
- Backbone
Model - NeuTTS backbone — RLX Llama-3.2 runner over a llama-tagged GGUF.
- Bert
Config - BERT model configuration.
- Bert
Flow - Bert
Tokenizer - Wrapper around HuggingFace tokenizer configured for BERT-style encoding.
- Block
Denoise Config - Block diffusion generation options (TIDE
generatedefaults). - Block
Denoise Loop - Driver for TIDE-style block masked diffusion (host-side token state).
- Built
Model - Result of assembling a model flow.
- Chat
Message - One turn in a ChatML conversation.
- Compile
Profile - Tier-1 compile configuration. Load from
*.rlx.tomlor use Rust presets. - Detection
Params - Post-processing parameters for the text detection segmentation mask.
- Dino
V2Built - Dino
V2Config - DINOv2 model configuration.
vit_giant(SwiGLU MLP) is not yet supported — vit_small / vit_base / vit_large are. - Dino
V2Flow - Dino
V2Preprocess Weights - Preprocess weights extracted from the safetensors checkpoint.
- Dino
V2Runner - Resolved DINOv2 runner.
- Dino
V2Runner Builder - Builder for
DinoV2Runner. Mirrors the qwen3 / sam shape. - Flux2
CfgCombine Flow - Tier-0 CFG combine:
neg + scale * (pos - neg). - Flux2
CfgCombine Graph - Flux2
Checkpoint - Resolved on-disk layout for a FLUX.2 HF repo (diffusers-style tree).
- Flux2
Config - FLUX.2 rectified-flow transformer (denoiser) configuration.
- Flux2
Flow - Tier-0 FLUX.2 dual-stream flow builder.
- Flux2
Forward Built - Full forward build product (includes non-f32 typed param blobs).
- Flux2
Forward Graph - Flux2
Forward Input - Inputs for one transformer forward (noise prediction).
- Flux2
Output - Noise prediction from
Flux2Runner::forward. - Flux2
Prompt Output - Flux2
Runner - FLUX.2 denoiser runner — native CPU or compiled HIR on any
Device. - Flux2
Runner Builder - Builder for
Flux2Runner. - Flux2
Session - One loaded FLUX.2 pipeline — cheap to clone via
Arc. - Flux2
Session Cache - Process-wide cache of
Flux2Runnerinstances (CLI--reuse-session/ serve mode). - Flux2
Session Key - Cache key for deduplicating loaded runners.
- Flux2
Text Encoder Built - Flux2
Text Encoder Flow - Tier-0 FLUX.2 text encoder flow (Qwen3-shaped causal LM trunk).
- Flux2
VaeConfig - Flux2
VaeDecoder Flow - Tier-0 FLUX.2 VAE decoder flow.
- Flux2
VaeEncoder Flow - Tier-0 FLUX.2 VAE encoder flow.
- Flux2
VaeGraph - Flux2
VaeWeights - Flux2
Weights - Gemma
Config - Gemma
Flow - Fluent Gemma flow builder — reads config once, chain modifiers, then
build. - Gemma
Generator - Stateful Gemma generation handle.
- Generate
Config - Generation options matching PyTorch
LLaDA2MoeModelLM.generate. - Generation
Config - Generation hyper-parameters for the GGUF backbone.
- Gguf
DirGuide - Image
Source - Input image for
crate::OcrEngine::prepare_input. - LLaD
A2Moe Config - LLaD
A2Runner - LLaD
A2Runner Builder - LLaD
A2Weights - Llama32
Config - Llama32
Flow - Fluent LLaMA-3.2 flow builder — reads config once, chain modifiers, then
build. - Llama32
Generator - Stateful LLaMA-3.2 generation handle.
- Llama32
Runner - Llama32
Runner Builder - Llama
Family Gguf Resolver - HF
model.layers.N.*↔ GGUFblk.N.*(Llama, Qwen3, Qwen35, …). - Load
Weights Options - Options for
load_weights_resolved— prefercrate::weights::LoadOptspresets at call sites. - LogMel
Extractor - LogMel
Features - MelSpectrogram
- Model
Info - Metadata for an embedding model.
- NeuCodec
Decoder - NeuCodec decoder: converts speech token IDs to a 24 kHz audio waveform.
- NeuCodec
Encoder - NeuCodec encoder: converts a 16 kHz audio waveform to speech token IDs.
- NeuTTS
- NeuTTS handle: GGUF backbone (optional) + NeuCodec decoder.
- Nomic
Bert Config - NomicBERT model configuration.
- Nomic
Flow - Nomic
Vision Built - Nomic
Vision Config - NomicVision model configuration.
- Nomic
Vision Flow - OcrConfig
- Shared OCR settings.
- OcrEngine
- End-to-end OCR pipeline (ocrs-compatible API).
- OcrEngine
Params - Parameters for constructing an
OcrEngine. - OcrInput
- Preprocessed greyscale input image
[1, H, W]. - OcrOutput
- Structured OCR output.
- OcrRunner
- OCR session wrapping a fully loaded
OcrEngine. - OcrRunner
Builder - Builder for
OcrRunner(mirrors whisper / dinov2 runners). - Predictive
Offload Info - Return value of TIDE
enable_predictive_expert_offload(JSON-serializable keys). - Predictive
Offload Params - Arguments matching TIDE
LLaDA2MoeModelLM.enable_predictive_expert_offload. - Qwen3
Config - Qwen3
Flow - Qwen3
Generator - Stateful Qwen3 generation handle.
- Qwen3
Prefill Opts - Qwen3
Runner - Resolved Qwen3 runner — call
Qwen3Runner::generatefor streaming decode (F32 path), orQwen3Runner::predict_logitsfor a single forward pass (works in both F32 and packed modes). - Qwen3
Runner Builder - Builder for
Qwen3Runner. See the module docs for usage. - Qwen3
Speculator Speculatoradapter wrapping aQwen3Generator. Eachpropose/verifycall resets the wrapped generator’s internal state and re-seeds the KV cache fromcontext— necessary because theSpeculatortrait gives no acceptance feedback, so the speculator cannot incrementally advance its cache to track the SpecDecoder’s chosen tokens.- Qwen35
Config - Qwen3.5 model config — fields covering both the per-layer Mamba+ Attention block and the MTP head.
- Qwen35
Full Attn Layer - Standard full-attention trunk layer (interspersed every
full_attention_intervalblocks). Perqwen35.cpp::load_block_trunknon-recurrent branch. - Qwen35
Linear Layer - Gated DeltaNet (“linear attention”) trunk layer. Mirrors
qwen35.cpp::load_block_trunkfor theis_recurrent(il)branch. - Qwen35
MoeFfn - MoE FFN tensors for one decoder layer (trunk or MTP).
- Qwen35
MtpLayer - One MTP (NextN) layer. Per
qwen35.cpp::load_block_mtp. - Qwen35
Native Gguf Resolver - Qwen3.5 native
blk.N.*names; also accept HF aliases via the Llama mapper. - Qwen35
Prefill Output - Qwen35
Runner - Qwen35
Runner Builder - Qwen35
Weights - Top-level Qwen3.5 / Qwen3.6 weight bundle.
- Registered
Format - One registered on-disk format (built-in or custom).
- Resolve
Weights Options - Options for
resolve_weights_file_with_options. - RlxBert
Model - RLX-compiled BERT model ready for inference.
- RlxEmbed
- High-level embedding model — auto-detects BERT / NomicBERT / NomicVision.
- RlxNomic
Model - RLX-compiled NomicBERT with shape-bucketed compile cache.
- RlxVision
Model - RLX-compiled NomicVision encoder (patch preprocess host-side, trunk on RLX).
- Rotated
Rect - An oriented rectangle.
- Sam2
- Full SAM 2 model — owns the compiled image encoder + every host-side weight bundle. The encoder result is recomputed per call (no encoder-caching here; layer above can wrap if needed).
- Sam3
- Sam2
Config - Top-level SAM 2 configuration — Hiera + FPN + decoder + memory
(encoder + attention) for the video path. Mirrors
SAM2Basein the reference. - Sam2
Decoder Config - Mask decoder configuration. Field names + defaults mirror
sam2/modeling/sam/mask_decoder.py::MaskDecoder.__init__and the publishedsam2_hiera_*.yamlmodel.sam_mask_decoder_extra_args. - Sam2
FpnConfig - FPN neck configuration. Mirrors
FpnNeckin the reference. - Sam2
FpnLevel - A single FPN level output — BCHW features + matched sinusoidal positional encoding.
- Sam2
FpnNeck Weights - Weights for the FPN neck — one 1×1 conv (
weight+bias) per backbone level. Stored coarse → fine to match the checkpoint’simage_encoder.neck.convs.{i}.conv.{weight,bias}ordering. - Sam2
Hiera Config - Hiera image-encoder configuration — Tiny, Small, Base+ or Large.
- Sam2
Image Encoder Built - Sam2
Image Encoder Flow - Sam2
Image Prediction - One frame’s worth of mask-decoder output, as returned by both
Sam2::predict_imageandSam2::predict_video_frame. - Sam2
Mask Decoder Output - Output of
mask_decoder_forward. - Sam2
Mask Decoder Weights - Sam2
Memory Attention Weights - Sam2
Memory Config - Memory-attention configuration (video path).
- Sam2
Memory Encoder Config - Memory-encoder configuration. Mirrors
sam2/modeling/memory_encoder.py::MemoryEncoder+ itsMaskDownSamplerandFuser. Defaults match every publishedsam2_hiera_*.yamlmemory_encoder:block. - Sam2
Memory Encoder Output - Sam2
Memory Encoder Weights - Sam2
Preprocess Weights - Weights extracted from the safetensors checkpoint that the host uses before the encoder graph runs.
- Sam2
Prompt Encoder Output - Output of
prompt_encoder_forward— fed straight into the mask decoder. All host-sideVec<f32>. - Sam2
Prompt Encoder Weights - All weights consumed by
prompt_encoder_forward. Loaded once from the safetensors file and then reused per prompt. - Sam2
TwoWay Transformer Weights - Sam2
Video State - Per-track state for
Sam2::predict_video_frame. Stores up tomax_obj_ptrs_in_encoderpast memory tokens + the rolling object-pointer queue. - Sam3
Compiled Decoder - Compile-once-per-layer decoder, runnable across many frames.
- Sam3
Config - Sam3
Detector Config - Sam3
Detector Decoder Built - Compile-once SAM3 detector decoder (six per-layer graphs + host glue).
- Sam3
Detector Decoder Flow - Sam3
Detector Encoder Flow - Sam3
Encoded Image - Sam3
Image Prediction - Sam3
Preprocess Weights - Sam3
Text Config - Sam3
Tracker Config - Sam3
Video Frame Prediction - Sam3
Video State - Sam3
VitConfig - SamConfig
- Top-level SAM configuration — encoder + decoder + a few constants shared between them.
- SamEncoder
Built - SamEncoder
Config - Encoder configuration — ViT-B/L/H or TinyViT variants.
- SamEncoder
Flow - SamNeck
Weights - Weights for the four neck layers, kept on the host because rlx-ir doesn’t have f32 forward Conv2d (and 3×3 padding=1 doesn’t reduce to matmul).
- SamPreprocess
Weights - Weights extracted from the safetensors checkpoint that the host uses before the encoder graph runs.
- Sample
Opts - Sampling configuration. Construct via
SampleOpts::greedy/SampleOpts::temperatureor build manually. - Text
Char - A single recognized character with its axis-aligned bounding box.
- Text
Line - A line of text composed of words.
- Text
Word - A word composed of one or more characters.
- Tide
Offload Stats - Cumulative counters aligned with TIDE
LLaDA2MoeSparseMoeBlock.offload_stats. - Tide
Runner - TIDE reference model runner (LLaDA2 MoE + block diffusion + predictive offload).
- Tokenized
Batch - Output of batch tokenization: token IDs, attention masks, and token type IDs.
- Vision
Preprocess Weights - Preprocessing weights extracted from safetensors for the caller to assemble the “hidden” input before graph execution.
- Vjepa2
Config - Vjepa2
Encoder Built - Vjepa2
Encoder Flow - Vjepa2
Encoder Output - Vjepa2
Encoder Weights - Vjepa2
Masks - Context / target patch indices for one batch element.
- Vjepa2
Model Weights - Vjepa2
Output - Encoder token output from
Vjepa2Runner::encode_video. - Vjepa2
Patch Embed Weights - Vjepa2
Pool Output - Attentive pooler output (+ optional classifier logits).
- Vjepa2
Pooler Flow - Vjepa2
Pooler Weights - Vjepa2
Predict Output - Predictor output (projected target tokens).
- Vjepa2
Predictor Flow - Vjepa2
Predictor Weights - Vjepa2
Runner - V-JEPA2 runner — encoder (+ optional predictor / pooler).
- Vjepa2
Runner Builder - Wav2
Vec2 Bert Config - Wav2Vec2-BERT model configuration (e.g. facebook/w2v-bert-2.0).
- Wav2
Vec2 Bert Flow - Wav2
Vec2 Bert Preprocess Config - Wav2
Vec2 Bert Runner - Wav2Vec2-BERT speech encoder runner.
- Wav2
Vec2 Bert Runner Builder - Weight
Format Registration - Describes one on-disk weight format.
- Weight
Map - Map of tensor name → (f32 data, shape).
- Weight
MapSource - Adapt in-memory
WeightMaptoWeightSource. - Whisper
Config - Whisper model dimensions (HF field names; see OpenAI
model.pycomments in candle). - Whisper
Decoder Flow - Whisper
Encoder Flow - Whisper
Runner - Whisper
Runner Builder - Whisper
Weight Prefix
Enums§
- Arch
- Detected embedding architecture from
config.json. - Chat
Role - Conversation role for
ChatMessage. - Decode
Method - Method used to decode CRNN sequence outputs.
- DimOrder
- Pixel layout for image tensors.
- Dino
V2Output - Forward output: classifier logits or token features.
- Dino
V2Variant - Which DINOv2 backbone size.
- Embed
Gguf Kind - BERT vs NomicBERT discriminator from GGUF metadata.
- Embedding
Model - Supported text embedding models.
- Gemma
Arch - Gguf
Model Family - LM families in this workspace that load
.ggufweights. - Image
Embedding Model - Supported image embedding models.
- Loaded
Weights - Result of resolving and opening weights.
- MatWeight
- Storage variant for matmul weight tensors. The big projections
(qkv / gate / ffn / lm_head) dominate the load footprint; the
Packedvariant keeps GGUF K-quant bytes in-place so the graph can emitOp::DequantMatMulinstead of a full F32 dequant. - Model
Arch - Model architecture type.
- Pooling
- Pooling strategy for reducing token hidden states to one vector per sequence.
- Precision
- Precision policy for the Qwen3 inference graph. Today only
F32is exact; the others toggle the corresponding env-vars on the Metal MPSGraph fast path (seeqwen3_metal_perfnotes). - Qwen35
Layer Ffn - Per-layer feed-forward: dense SwiGLU or MoE (routed + gated shared expert).
- Qwen35
Trunk Layer - One trunk-layer tensor bundle. Either a gated-DeltaNet “linear attention” block or a standard full-attention block.
- Weight
Drain Policy - How
WeightMap::drain_loader/WeightMap::from_weight_loaderhandle leftovers. - Weight
Format - What the model file is. Used by runners to pick the right loader.
Constants§
- BLACK_
VALUE - Normalized greyscale background value used by ocrs models (matches
ocrs0.12.x). - DEFAULT_
ALPHABET - Default character alphabet matching ocrs pretrained recognition models.
- DEFAULT_
N_ CTX - Default context window (must match Python’s
max_context = 2048). - DEFAULT_
TEXT_ ENCODER_ LAYERS - Default hidden-state indices for FLUX.2 Klein (matches mflux).
- HF_
DETECTION_ RTEN - Production detection checkpoint (legacy RTen graph).
- HF_
DETECTION_ ST - Safetensors export of detection weights (short name).
- HF_
RECOGNITION_ RTEN - Production recognition checkpoint (legacy RTen CRNN + GRU graph).
- HF_
RECOGNITION_ ST - Safetensors export of recognition weights.
- STANDARD_
DEVICE_ NAMES - CLI / help string for
--device. - STOP_
TOKEN - Stop sequence emitted by the model when generation is complete.
Traits§
- Flow
Build Ext - Gguf
Tensor Name Resolver - Resolve a builder-requested tensor name to the name stored in a GGUF file.
- LmRunner
- Minimal per-family runner interface.
- Model
Runner - One CLI entry per model family. Each per-crate
rlx-<family>binary calls its ownrundirectly; the optionalrlx-runmultiplexer registers manyModelRunnerimplementations. - Weight
Loader - Common interface every weight format must satisfy. Mirrors the
existing
WeightMapAPI so the safetensors impl is a one-line adapter.
Functions§
- aggregate_
offload_ stats - Sum per-layer pool stats + optional CPU residency accounting from last forward.
- apply_
compile_ profile - Apply tier-1 profile options to runtime compile options.
- assemble_
vision_ hidden - Assemble encoder input
[batch, seq, hidden]from NCHW pixels + preprocess weights. - assert_
gguf_ family - Open the file and ensure
general.architecturematchesexpected. - build_
bert_ built - build_
bert_ graph - Build a BERT encoder IR graph from config and weights.
- build_
bert_ graph_ sized - build_
dinov2_ built - build_
dinov2_ graph_ sized - Build the DINOv2 IR graph via native [
ModelFlow]. - build_
flux2_ cfg_ combine_ hir - HIR graph:
neg + guidance_scale * (pos - neg)in f32. - build_
flux2_ forward_ graph - build_
flux2_ forward_ hir - Build the full denoiser forward graph in HIR.
- build_
flux2_ minimal_ graph - Lower minimal HIR to legacy
Graph(MIR inner) forSession::compile. - build_
flux2_ minimal_ hir - Build a compile-minimal HIR module:
x_embedder(hidden)→proj_out. - build_
flux2_ text_ encoder_ hir - build_
gemma_ decode_ graph_ sized - build_
gemma_ graph_ sized - build_
gemma_ graph_ sized_ last_ logits - build_
gemma_ graph_ sized_ packed - Packed K-quant prefill — not yet implemented; use unpacked weights or flow build.
- build_
graph - Build via flow and lower to MIR graph + params.
- build_
llada2_ forward_ graph - Full-sequence forward with custom block-diffusion mask.
- build_
llama32_ decode_ graph_ sized - build_
llama32_ graph_ sized - build_
llama32_ graph_ sized_ last_ logits - build_
llama32_ graph_ sized_ packed - Packed-weights prefill graph — K-quant matmuls stay in the arena via
Op::DequantMatMul(mirrorsrlx_qwen3::build_qwen3_graph_sized_packed). - build_
nomic_ built - build_
nomic_ diagnostic_ graph - Diagnostic builder — same as
build_nomic_graph_sizedbut exposes intermediate tensors at every transformer-stage boundary as outputs. Returns (graph, params, checkpoint_names) where outputs[i] holds the tensor at checkpoint_names[i]. Used by examples/tests to bisect numerical issues (NaN/Inf) without instrumenting the executor. - build_
nomic_ graph_ sized - Build a NomicBERT encoder IR graph.
- build_
nomic_ vision_ built - build_
prompt - Build the GGUF prompt string from phonemized text + reference codec tokens.
- build_
qwen3_ graph_ sized - Build a Qwen3 causal-LM IR graph.
- build_
qwen3_ prefill_ built - build_
qwen35_ decode_ graph - Single-token decode graph at prefix length
past_seq. - build_
qwen35_ decode_ hir_ dynamic_ ext - Decode HIR with symbolic past length (
sym::PAST_SEQ) for dynamic compile cache. - build_
qwen35_ graph_ sized - Build the Qwen3.5 forward IR.
- build_
qwen35_ graph_ sized_ ext - Forward graph with optional runtime MRoPE inputs (
rope_cos/rope_sin). - build_
qwen35_ graph_ sized_ stub - Legacy redirect — qwen35 forward is implemented via
build_qwen35_graph_sized. Kept so older call sites get a clear message instead of a missing-symbol error. - build_
qwen35_ prefill_ cache_ graph - Prefill graph that seeds
super::cache::Qwen35DecodeCache. - build_
qwen35_ prefill_ cache_ graph_ ext - Prefill-cache graph with optional runtime MRoPE inputs (multimodal).
- build_
qwen35_ prefill_ cache_ hir_ dynamic_ ext - Prefill-cache HIR with symbolic seq dim (
sym::SEQ) for dynamic compile cache. - build_
sam2_ image_ encoder_ built - build_
sam2_ image_ encoder_ graph - Lowered graph wrapper for legacy callers (via
super::flow::Sam2ImageEncoderFlow). - build_
sam3_ detector_ decoder_ built - build_
sam3_ detector_ encoder_ built - build_
sam3_ detector_ encoder_ graph - Lower encoder HIR to legacy
Graph(viasuper::flow::Sam3DetectorEncoderFlow). - build_
sam_ encoder_ built - build_
sam_ encoder_ graph - Lowered graph wrapper for legacy callers (via
super::flow::SamEncoderFlow). - build_
vision_ graph_ sized - Build a NomicVision encoder IR graph via native [
ModelFlow]. - build_
vjepa2_ encoder_ graph_ sized - Build the V-JEPA2 encoder IR graph from extracted weights (via
super::flow::Vjepa2EncoderFlow). - build_
wav2vec2_ bert_ built - build_
wav2vec2_ bert_ graph_ sized - Build a Wav2Vec2-BERT encoder IR graph for concrete
batch×seq. - build_
whisper_ decode_ step_ built - build_
whisper_ decoder_ built - build_
whisper_ decoder_ graph_ sized - build_
whisper_ decoder_ prefill_ built - build_
whisper_ encoder_ built - build_
whisper_ encoder_ graph_ sized - built_
from_ graph - built_
from_ hir - built_
from_ hir_ with_ profile - cfg_
combine - Native CFG blend in float32.
- compile_
built - Compile a
BuiltModelon the given device using its embedded profile. - compile_
built_ cpu - Compile a
BuiltModelon CPU with default options (embedding quick-check tests). - compile_
flux2_ cfg_ combine - compile_
flux2_ forward - compile_
flux2_ forward_ via_ flow - Compile denoiser via tier-0
Flux2Flowwrapper (same numerics assuper::hir_builder::compile_flux2_forward). - compile_
flux2_ minimal - Compile minimal HIR on CPU (HIR → MIR → LIR).
- compile_
flux2_ text_ encoder_ hir - compile_
graph_ encoder - Bidirectional encoder defaults (BERT, DINOv2, Wav2Vec2, vision towers).
- compile_
graph_ encoder_ with_ params CompileProfile::encoder+ params.- compile_
graph_ legacy - Unprofiled compile (parity probes / bisect tests).
- compile_
graph_ llama32_ decode - Llama 3.2 decode graphs.
- compile_
graph_ llama32_ prefill - Llama 3.2 prefill graphs.
- compile_
graph_ profile - Lower a graph with a tier-1 profile and attach params (tests / examples).
- compile_
graph_ qwen3_ decode - Qwen3 single-token decode graphs.
- compile_
graph_ qwen3_ prefill - Qwen3 prefill / full-sequence graphs.
- compile_
graph_ qwen3_ prefill_ with_ params CompileProfile::qwen3_prefill+ params.- compile_
graph_ qwen35_ decode - Qwen3.5 decode-step graphs.
- compile_
graph_ qwen35_ decode_ with_ params CompileProfile::qwen35_decode+ params.- compile_
graph_ qwen35_ prefill - Qwen3.5 prefill-cache / predict graphs.
- compile_
graph_ qwen35_ prefill_ with_ params CompileProfile::qwen35_prefill+ params.- compile_
graph_ sam - Compile a SAM/SAM2/SAM3 vision subgraph with tier-1 encoder profile options.
- compile_
graph_ sam_ with_ params CompileProfile::sam_encoder+ params.- compile_
graph_ with_ profile - Compile a vision subgraph with explicit tier-1 profile options.
- compile_
model - Compile an embedding graph for the given batch/seq on
device. - conv3d_
patch_ embed - 3-D conv patch embedding: input
[C, T, H, W]→ tokens[seq, embed_dim]. - debug_
resolve_ name - decode_
step_ feeds - Build host feeds for a single decode step from
cache. - default_
mel_ frames - Default 30 s chunk mel width.
- default_
memory_ budget_ bytes - VRAM / unified-memory budget hint for MoE offload sizing.
- detect_
arch - Detect architecture from config.json fields.
- dispatch
- dispatch_
help - download_
flux2_ repo - embed_
with_ rlx - Embed texts with a compiled BERT model: tokenize, forward, pool, L2-normalize.
- encode_
chat_ auto - Resolve tokenizer next to weights and encode a chat conversation.
- encode_
flux2_ prompt - End-to-end: tokenize (optional) + text encoder → embeddings + text ids.
- encode_
prompt_ embeds_ default_ layers - Encode with default Klein layer indices (9, 18, 27).
- encode_
prompt_ padded - Encode and pad/truncate to fixed
seq_len(pad token id 0). - encode_
video_ native - Encode a pre-normalized video tensor
[C, T, H, W]. - extract_
encoder_ weights - extract_
flux2_ vae_ weights - extract_
flux2_ weights - extract_
ids - Extract all speech token IDs from a generated string.
- extract_
model_ weights - extract_
patch_ embed_ weights - extract_
pooler_ weights - extract_
predictor_ weights - extract_
text_ encoder_ weights - flux2_
decode_ packed_ latents - Full post-denoise decode: packed transformer latents → 8-bit RGB planar
[batch, 3, H, W]. - flux2_
prefers_ compiled_ hir - True when the denoiser / VAE should use compiled HIR (non-CPU backends).
- flux2_
prefers_ compiled_ te - Text encoder HIR on CUDA compiles a full Qwen3 trunk and can take hours + fill VRAM while the denoiser is still resident. Native CPU encode once, then drop TE.
- flux2_
rgb_ to_ u8 - Planar RGB
[-1,1]→ interleavedu8HWC for PNG. - flux2_
transformer_ forward - Run the FLUX.2 transformer and return noise prediction
[batch, img_seq, patch_size² * out_channels]. - format_
chatml - Format messages as a ChatML prompt ending with an open assistant turn.
- format_
for_ extension - Extension → format id (last registration wins).
- forward_
decoder_ ir_ on - IR-compiled detector decoder on the requested device (6 layer graphs).
- gemma_
cfg_ from_ gguf - gemma_
encode_ prompt - Encode
textto token ids using a HuggingFace tokenizer file. - gemma_
encode_ prompt_ auto - Encode with an optional explicit path; falls back to GGUF
embedded vocab via
encode_prompt_from_ggufwhen notokenizer.jsonis found. - gemma_
resolve_ tokenizer_ path - Resolve a tokenizer path: explicit
--tokenizer, sibling of the GGUF weights, ortokenizer.jsonin the weights directory. - gguf_
architecture_ str general.architecturestring from GGUF metadata, if present.- gguf_
dir_ guide - Numbered
.gguflisting + resolve hints for a directory (CLI / errors). - gguf_
f32_ bytes_ estimate - Rough F32 dequant footprint (every tensor × 4 bytes).
- gguf_
family_ for_ arch - Map a GGUF architecture tag to the runner family that should load it.
- gguf_
runner_ hint - Suggested runner / crate for a GGUF architecture tag (for CLI and errors).
- graph_
from_ built - Build a flow and return
(Graph, params)— preferred compile entry point. - graph_
from_ hir - Lower an existing HIR module through
BuiltModel(utility for HIR-first builders). - host_
temb - Host-side temb for compiled forward (timestep × 1000, optional guidance × 1000).
- into_
compile_ parts - Split built flow for compile — no Graph/HIR imports needed at call site.
- is_
standard_ device - True when
deviceis inSTANDARD_DEVICES. - list_
mtp_ keys - list_
registered_ formats - All registered formats (built-ins first, then custom registrations).
- llama32_
cfg_ from_ gguf - llama32_
encode_ prompt - Encode
textto token ids using a HuggingFace tokenizer file. - llama32_
encode_ prompt_ auto - Encode with an optional explicit path; falls back to GGUF
embedded vocab via
encode_prompt_from_ggufwhen notokenizer.jsonis found. - llama32_
resolve_ tokenizer_ path - Resolve a tokenizer path: explicit
--tokenizer, sibling of the GGUF weights, ortokenizer.jsonin the weights directory. - load_
and_ apply_ flux2_ lora - Load LoRA from safetensors and merge into
base. - load_
compile_ profile - Load a tier-1 profile from disk; fall back to
defaultwhen missing or invalid. - load_
flux2_ vae_ weights - load_
flux2_ weights - load_
from_ path - Dispatch on the file extension via
crate::weight_registry. - load_
rgb_ planar - Load an image, resize to
(width, height), return planar NCHW f32 in[-1, 1]. - load_
text_ encoder_ weights - load_
wav_ mono_ f32 - load_
weight_ map_ resolved - Convenience: resolve + drain to F32
WeightMap. - load_
weights_ resolved - Resolve a file or directory, enforce GGUF arch policy, open via registry, optionally drain.
- messages_
from_ prompt - Convenience: system (optional) + user prompt → ChatML messages.
- models_
map - Get the global model registry.
- mrope_
prefill_ feeds - Flattened
[seq * head_half]cos/sin for runtime MRoPE graph inputs. - mrope_
row_ for_ sections - Build one MRoPE cos/sin row from explicit per-section positions.
- mrope_
slice_ at_ pos - Slice MRoPE cos/sin at absolute text position
pos(shape[half]each). - mtp_
draft_ vocab_ size - MTP LM head output width: full vocab, or trimmed for FastMTP draft speed.
- normalize_
video_ hwc - Normalize RGB u8 frames to NCTHW f32 in
[0,1]then ImageNet stats.framesis[num_frames, crop, crop, 3]HWC u8 row-major. - open_
gguf_ loader - GGUF loader with optional MTP-head visibility (LM families).
- open_
loader - open_
loader_ resolved - Resolve a file or weights directory, then open the right loader.
- open_
loader_ with_ format - open_
map - Resolve + drain to F32
WeightMap. - open_
map_ with - Resolve + drain with options.
- open_
weights - Resolve + open (live
WeightLoader). - open_
with - Resolve + open with options.
- pack_
input_ ids - Pack per-row prompts into
[batch, max_seq]row-major F32 ids (zero-pad). - parse_
lora_ scale - Parse
--lora-scalestyle input; rejects NaN/inf. - parse_
messages_ json - Parse a JSON array of
{ "role": "...", "content": "..." }. - pcm_
to_ mel - pool_
native - Pool encoder tokens
[batch, seq, hidden]→[batch, hidden]embedding. - predict_
native - Run the predictor on encoder outputs
[batch, seq, enc_dim]flat. - prepare_
latent_ ids - FLUX.2 latent position ids
[batch, h*w, 4]with(t=0, h, w, l=0). - prepare_
text_ ids - Build FLUX.2-style text position ids
[batch, seq, 4]flattened as[seq*4]. - prepare_
weight_ map - Full load-time adaptation pipeline.
- profile_
near_ weights - Load
profile_filenext toweights(parent directory); fall back todefault. - recurrent_
output_ count - Number of extra graph outputs after logits (and optional MTP).
- refresh_
experts - Whether to call
moe_infer_with_expert_refreshon this forward. - register_
gguf_ tensor_ resolver - Register a custom resolver (call before first GGUF load). Later registrations win among resolvers that match the same architecture.
- register_
runner - register_
weight_ format - Register a custom weight format (call before the first load). Later entries override built-ins when the same extension is registered twice.
- registered_
runners - resolve_
model_ dir - Resolve detection + recognition weight paths under
dir(safetensors for native RLX). - resolve_
text_ encoder_ dir - Resolve
text_encoder/next to a transformer weights file or model root. - resolve_
tokenizer_ path - Resolve tokenizer path: explicit,
tokenizer/tokenizer.json, or siblingtokenizer.json. - resolve_
transformer_ config - Resolve transformer
config.jsonfrom explicit override or sibling search. - resolve_
vae_ dir - Resolve
vae/next to a transformer weights file or model root. - resolve_
weights_ file - Resolve
--weightsto a single file: pass-through for files, or pick one.gguf/model.safetensorsinside a directory. - resolve_
weights_ file_ with_ options - Resolve with optional GGUF file selection inside a directory.
- run_
registered - sam2_
apply_ fpn_ neck - Run the FPN neck.
stage_outputs[i]is the encoder’s stage-ioutput flattened from BHWC[1, h, w, dim]to[h·w·dim].stage_dims[i] = dim,stage_hw[i] = (h, w)— pulled straight from the graph’s stage-output shapes (or computed fromcfg.embed_dim_at_stage(s)/cfg.grid_size_at_stage(s)). - sam2_
apply_ fpn_ neck_ host - Host-only lateral convs (legacy entry point).
- sam2_
assemble_ patch_ tokens - Run Hiera’s patch embedding (Conv2d k=7 s=4 p=3) on the host, then
add the stage-0 position embedding. Output is
[grid, grid, E]BHWC (the layout Hiera operates on internally), flattened. - sam2_
mask_ decoder_ forward - Run the SAM 2 mask decoder.
- sam2_
memory_ attention_ forward - Memory attention forward.
- sam2_
memory_ encoder_ forward - Run the SAM 2 memory encoder.
- sam2_
preprocess_ image - Square-resize an RGB u8 image to 1024×1024 (bilinear, no aspect-
ratio preservation), /255, then ImageNet-normalise. Returns a
contiguous
[3, 1024, 1024]NCHW f32 buffer. - sam2_
prompt_ encoder_ forward - Run the SAM 2 prompt encoder. Mirrors
sam2.modeling.sam.prompt_encoder.PromptEncoder.forward. - sam2_
two_ way_ transformer_ forward - Top-level two-way transformer forward.
- sam3_
assemble_ patch_ tokens - sam3_
preprocess_ image - Resize an RGB u8 image to fit in SAM3’s square canvas, normalize, and pad.
- sam_
apply_ neck_ host - Run the encoder neck on the host.
body_outis the encoder body’s output reshaped to[hw·hw, embed_dim](BHWC flattened). Returns[out_chans, hw, hw]NCHW image embeddings. - sam_
assemble_ patch_ tokens - Run the patch embedding (Conv2d k=16 s=16 no padding) on the host
and add the absolute positional embedding. Output is
[1, hw, hw, E]BHWC (SAM’s internal convention) flattened to a contiguous f32 buffer for the encoder graph. - sam_
preprocess_ image - Resize an RGB u8 image to fit within
SAM_IMG_SIZEon the long side (aspect-ratio preserved), normalize with SAM’s pixel stats, and zero-pad to a square[3, 1024, 1024]NCHW f32 tensor. - sample_
token - Sample one token id from a
[vocab]logits slice. Returns the chosen index. Stateless w.r.t. prior calls — the RNG is seeded per-call fromopts.seedso repeated calls with the same seed and logits yield the same token. - seed_
cache_ from_ outputs - Parse prefill-cache graph outputs into logits/hidden +
Qwen35DecodeCache. Whentrunk_is_hidden, the first output is[batch × hidden_size]not logits. - supports_
multimodal_ mrope - True when the checkpoint declares a non-zero 4th MRoPE section (vision).
- text_
section_ pos - Text-modality default:
[p, p, p, 0]per llama.cpp token batches. - tiny_
text_ encoder_ config Qwen3Configsized forsuper::weights::synthetic_text_encoder_weightstests.- validate_
device - Validate that
deviceis in the workspace standard backend set (CPU, Metal, MLX, CUDA, ROCm, WGPU, Vulkan). Build withall-backendsonrlx-qwen35to link every native runtime backend into therlx-qwen35binary. - validate_
llada2_ device - Supported execution devices (standard RLX backends).
- validate_
sam_ device - SAM v1 also documents
tpuon [rlx_sam::Sam::from_safetensors_on]. - validate_
standard_ device - Fail fast on exotic runtime devices (TPU, ANE, OpenGL, …).
- zero_
recurrent_ inputs - Zero-initialized recurrent inputs for a prefill-cache seed graph.
Type Aliases§
- Flux2
Graph Params - Param tensors keyed by name for
rlx_runtime::CompiledGraph::set_param. - Load
Opts - Alias for
LoadWeightsOptions. - Pass
Through Gguf Resolver - Alias for
PrefixStripGgufResolver(older name). - Resolve
Opts - Alias for
ResolveWeightsOptions. - Whisper
KvCache - Incremental self-attention cache.