Skip to main content

Crate rlx_models

Crate rlx_models 

Source
Expand description

RLX model loading — parse configs, load weights, build IR graphs.

This crate is a thin facade over per-model workspace members (rlx-qwen3, rlx-sam, …). Depend on a specific model crate directly when you only need one family.

Re-exports§

pub use run::ConfigSource;
pub use run::SamArch;
pub use run::SamPredictionAny;
pub use run::SamRunner;
pub use run::SamRunnerBuilder;

Modules§

arch_registry
Architecture registry (plan #82).
bert
bert_flow
bonsai
cohere
config
Model configuration structs — parsed from HuggingFace config.json.
dataprocessing
Reusable batch-prep utilities (plan #83).
diamond
dinov2
embed
flow_bridge
Bridge between rlx-models loaders/runtime and rlx-flow.
flow_util
Shared helpers for tier-0 model flow migration.
flux2
gemma
gguf_resolve
Pluggable GGUF tensor-name resolution per general.architecture.
gguf_support
Shared GGUF helpers for LM runners (architecture checks, path resolution).
granite
llada2
llama32
lm
Shared causal-LM flow helpers — re-export tier-0 surface for model authors.
mask_hyper_matmul_ir
mask_prompt_ir
mistral
mlp_relu_ir
neutts
nomic
nomic_flow
ocr
ocrsDeprecated
omnicoder
phi
qwen3
qwen35
qwen35_synth
Synthetic configs/weights for tests and qwen35_inference bench. Synthetic Qwen3.5 weights for integration tests and criterion benches.
run
High-level runner API — re-exported from per-model crates.
sam
sam2
sam3
tide
twoway_transformer_ir
vision
vision_flow
vision_ops_ir
Shared HIR builders for NCHW vision ops (Conv, ConvTranspose2d, LayerNorm2d, bias broadcast). Used by SAM / SAM2 / SAM3.
vjepa2
wav2vec2_bert
weight_loader
Pluggable weight loader trait (plan #56).
weight_map
Safetensors weight loading — standalone, no framework dependency.
weight_registry
Extensible weight-format registry — register custom loaders for new extensions.
weights
Model-agnostic weight I/O — paths, formats, drain policy only.
whisper

Structs§

BackboneModel
NeuTTS backbone — RLX Llama-3.2 runner over a llama-tagged GGUF.
BertConfig
BERT model configuration.
BertFlow
BertTokenizer
Wrapper around HuggingFace tokenizer configured for BERT-style encoding.
BlockDenoiseConfig
Block diffusion generation options (TIDE generate defaults).
BlockDenoiseLoop
Driver for TIDE-style block masked diffusion (host-side token state).
BuiltModel
Result of assembling a model flow.
ChatMessage
One turn in a ChatML conversation.
CompileProfile
Tier-1 compile configuration. Load from *.rlx.toml or use Rust presets.
DetectionParams
Post-processing parameters for the text detection segmentation mask.
DinoV2Built
DinoV2Config
DINOv2 model configuration. vit_giant (SwiGLU MLP) is not yet supported — vit_small / vit_base / vit_large are.
DinoV2Flow
DinoV2PreprocessWeights
Preprocess weights extracted from the safetensors checkpoint.
DinoV2Runner
Resolved DINOv2 runner.
DinoV2RunnerBuilder
Builder for DinoV2Runner. Mirrors the qwen3 / sam shape.
Flux2CfgCombineFlow
Tier-0 CFG combine: neg + scale * (pos - neg).
Flux2CfgCombineGraph
Flux2Checkpoint
Resolved on-disk layout for a FLUX.2 HF repo (diffusers-style tree).
Flux2Config
FLUX.2 rectified-flow transformer (denoiser) configuration.
Flux2Flow
Tier-0 FLUX.2 dual-stream flow builder.
Flux2ForwardBuilt
Full forward build product (includes non-f32 typed param blobs).
Flux2ForwardGraph
Flux2ForwardInput
Inputs for one transformer forward (noise prediction).
Flux2Output
Noise prediction from Flux2Runner::forward.
Flux2PromptOutput
Flux2Runner
FLUX.2 denoiser runner — native CPU or compiled HIR on any Device.
Flux2RunnerBuilder
Builder for Flux2Runner.
Flux2Session
One loaded FLUX.2 pipeline — cheap to clone via Arc.
Flux2SessionCache
Process-wide cache of Flux2Runner instances (CLI --reuse-session / serve mode).
Flux2SessionKey
Cache key for deduplicating loaded runners.
Flux2TextEncoderBuilt
Flux2TextEncoderFlow
Tier-0 FLUX.2 text encoder flow (Qwen3-shaped causal LM trunk).
Flux2VaeConfig
Flux2VaeDecoderFlow
Tier-0 FLUX.2 VAE decoder flow.
Flux2VaeEncoderFlow
Tier-0 FLUX.2 VAE encoder flow.
Flux2VaeGraph
Flux2VaeWeights
Flux2Weights
GemmaConfig
GemmaFlow
Fluent Gemma flow builder — reads config once, chain modifiers, then build.
GemmaGenerator
Stateful Gemma generation handle.
GenerateConfig
Generation options matching PyTorch LLaDA2MoeModelLM.generate.
GenerationConfig
Generation hyper-parameters for the GGUF backbone.
GgufDirGuide
ImageSource
Input image for crate::OcrEngine::prepare_input.
LLaDA2MoeConfig
LLaDA2Runner
LLaDA2RunnerBuilder
LLaDA2Weights
Llama32Config
Llama32Flow
Fluent LLaMA-3.2 flow builder — reads config once, chain modifiers, then build.
Llama32Generator
Stateful LLaMA-3.2 generation handle.
Llama32Runner
Llama32RunnerBuilder
LlamaFamilyGgufResolver
HF model.layers.N.* ↔ GGUF blk.N.* (Llama, Qwen3, Qwen35, …).
LoadWeightsOptions
Options for load_weights_resolved — prefer crate::weights::LoadOpts presets at call sites.
LogMelExtractor
LogMelFeatures
MelSpectrogram
ModelInfo
Metadata for an embedding model.
NeuCodecDecoder
NeuCodec decoder: converts speech token IDs to a 24 kHz audio waveform.
NeuCodecEncoder
NeuCodec encoder: converts a 16 kHz audio waveform to speech token IDs.
NeuTTS
NeuTTS handle: GGUF backbone (optional) + NeuCodec decoder.
NomicBertConfig
NomicBERT model configuration.
NomicFlow
NomicVisionBuilt
NomicVisionConfig
NomicVision model configuration.
NomicVisionFlow
OcrConfig
Shared OCR settings.
OcrEngine
End-to-end OCR pipeline (ocrs-compatible API).
OcrEngineParams
Parameters for constructing an OcrEngine.
OcrInput
Preprocessed greyscale input image [1, H, W].
OcrOutput
Structured OCR output.
OcrRunner
OCR session wrapping a fully loaded OcrEngine.
OcrRunnerBuilder
Builder for OcrRunner (mirrors whisper / dinov2 runners).
PredictiveOffloadInfo
Return value of TIDE enable_predictive_expert_offload (JSON-serializable keys).
PredictiveOffloadParams
Arguments matching TIDE LLaDA2MoeModelLM.enable_predictive_expert_offload.
Qwen3Config
Qwen3Flow
Qwen3Generator
Stateful Qwen3 generation handle.
Qwen3PrefillOpts
Qwen3Runner
Resolved Qwen3 runner — call Qwen3Runner::generate for streaming decode (F32 path), or Qwen3Runner::predict_logits for a single forward pass (works in both F32 and packed modes).
Qwen3RunnerBuilder
Builder for Qwen3Runner. See the module docs for usage.
Qwen3Speculator
Speculator adapter wrapping a Qwen3Generator. Each propose/verify call resets the wrapped generator’s internal state and re-seeds the KV cache from context — necessary because the Speculator trait gives no acceptance feedback, so the speculator cannot incrementally advance its cache to track the SpecDecoder’s chosen tokens.
Qwen35Config
Qwen3.5 model config — fields covering both the per-layer Mamba+ Attention block and the MTP head.
Qwen35FullAttnLayer
Standard full-attention trunk layer (interspersed every full_attention_interval blocks). Per qwen35.cpp::load_block_trunk non-recurrent branch.
Qwen35LinearLayer
Gated DeltaNet (“linear attention”) trunk layer. Mirrors qwen35.cpp::load_block_trunk for the is_recurrent(il) branch.
Qwen35MoeFfn
MoE FFN tensors for one decoder layer (trunk or MTP).
Qwen35MtpLayer
One MTP (NextN) layer. Per qwen35.cpp::load_block_mtp.
Qwen35NativeGgufResolver
Qwen3.5 native blk.N.* names; also accept HF aliases via the Llama mapper.
Qwen35PrefillOutput
Qwen35Runner
Qwen35RunnerBuilder
Qwen35Weights
Top-level Qwen3.5 / Qwen3.6 weight bundle.
RegisteredFormat
One registered on-disk format (built-in or custom).
ResolveWeightsOptions
Options for resolve_weights_file_with_options.
RlxBertModel
RLX-compiled BERT model ready for inference.
RlxEmbed
High-level embedding model — auto-detects BERT / NomicBERT / NomicVision.
RlxNomicModel
RLX-compiled NomicBERT with shape-bucketed compile cache.
RlxVisionModel
RLX-compiled NomicVision encoder (patch preprocess host-side, trunk on RLX).
RotatedRect
An oriented rectangle.
Sam2
Full SAM 2 model — owns the compiled image encoder + every host-side weight bundle. The encoder result is recomputed per call (no encoder-caching here; layer above can wrap if needed).
Sam3
Sam2Config
Top-level SAM 2 configuration — Hiera + FPN + decoder + memory (encoder + attention) for the video path. Mirrors SAM2Base in the reference.
Sam2DecoderConfig
Mask decoder configuration. Field names + defaults mirror sam2/modeling/sam/mask_decoder.py::MaskDecoder.__init__ and the published sam2_hiera_*.yaml model.sam_mask_decoder_extra_args.
Sam2FpnConfig
FPN neck configuration. Mirrors FpnNeck in the reference.
Sam2FpnLevel
A single FPN level output — BCHW features + matched sinusoidal positional encoding.
Sam2FpnNeckWeights
Weights for the FPN neck — one 1×1 conv (weight + bias) per backbone level. Stored coarse → fine to match the checkpoint’s image_encoder.neck.convs.{i}.conv.{weight,bias} ordering.
Sam2HieraConfig
Hiera image-encoder configuration — Tiny, Small, Base+ or Large.
Sam2ImageEncoderBuilt
Sam2ImageEncoderFlow
Sam2ImagePrediction
One frame’s worth of mask-decoder output, as returned by both Sam2::predict_image and Sam2::predict_video_frame.
Sam2MaskDecoderOutput
Output of mask_decoder_forward.
Sam2MaskDecoderWeights
Sam2MemoryAttentionWeights
Sam2MemoryConfig
Memory-attention configuration (video path).
Sam2MemoryEncoderConfig
Memory-encoder configuration. Mirrors sam2/modeling/memory_encoder.py::MemoryEncoder + its MaskDownSampler and Fuser. Defaults match every published sam2_hiera_*.yaml memory_encoder: block.
Sam2MemoryEncoderOutput
Sam2MemoryEncoderWeights
Sam2PreprocessWeights
Weights extracted from the safetensors checkpoint that the host uses before the encoder graph runs.
Sam2PromptEncoderOutput
Output of prompt_encoder_forward — fed straight into the mask decoder. All host-side Vec<f32>.
Sam2PromptEncoderWeights
All weights consumed by prompt_encoder_forward. Loaded once from the safetensors file and then reused per prompt.
Sam2TwoWayTransformerWeights
Sam2VideoState
Per-track state for Sam2::predict_video_frame. Stores up to max_obj_ptrs_in_encoder past memory tokens + the rolling object-pointer queue.
Sam3CompiledDecoder
Compile-once-per-layer decoder, runnable across many frames.
Sam3Config
Sam3DetectorConfig
Sam3DetectorDecoderBuilt
Compile-once SAM3 detector decoder (six per-layer graphs + host glue).
Sam3DetectorDecoderFlow
Sam3DetectorEncoderFlow
Sam3EncodedImage
Sam3ImagePrediction
Sam3PreprocessWeights
Sam3TextConfig
Sam3TrackerConfig
Sam3VideoFramePrediction
Sam3VideoState
Sam3VitConfig
SamConfig
Top-level SAM configuration — encoder + decoder + a few constants shared between them.
SamEncoderBuilt
SamEncoderConfig
Encoder configuration — ViT-B/L/H or TinyViT variants.
SamEncoderFlow
SamNeckWeights
Weights for the four neck layers, kept on the host because rlx-ir doesn’t have f32 forward Conv2d (and 3×3 padding=1 doesn’t reduce to matmul).
SamPreprocessWeights
Weights extracted from the safetensors checkpoint that the host uses before the encoder graph runs.
SampleOpts
Sampling configuration. Construct via SampleOpts::greedy / SampleOpts::temperature or build manually.
TextChar
A single recognized character with its axis-aligned bounding box.
TextLine
A line of text composed of words.
TextWord
A word composed of one or more characters.
TideOffloadStats
Cumulative counters aligned with TIDE LLaDA2MoeSparseMoeBlock.offload_stats.
TideRunner
TIDE reference model runner (LLaDA2 MoE + block diffusion + predictive offload).
TokenizedBatch
Output of batch tokenization: token IDs, attention masks, and token type IDs.
VisionPreprocessWeights
Preprocessing weights extracted from safetensors for the caller to assemble the “hidden” input before graph execution.
Vjepa2Config
Vjepa2EncoderBuilt
Vjepa2EncoderFlow
Vjepa2EncoderOutput
Vjepa2EncoderWeights
Vjepa2Masks
Context / target patch indices for one batch element.
Vjepa2ModelWeights
Vjepa2Output
Encoder token output from Vjepa2Runner::encode_video.
Vjepa2PatchEmbedWeights
Vjepa2PoolOutput
Attentive pooler output (+ optional classifier logits).
Vjepa2PoolerFlow
Vjepa2PoolerWeights
Vjepa2PredictOutput
Predictor output (projected target tokens).
Vjepa2PredictorFlow
Vjepa2PredictorWeights
Vjepa2Runner
V-JEPA2 runner — encoder (+ optional predictor / pooler).
Vjepa2RunnerBuilder
Wav2Vec2BertConfig
Wav2Vec2-BERT model configuration (e.g. facebook/w2v-bert-2.0).
Wav2Vec2BertFlow
Wav2Vec2BertPreprocessConfig
Wav2Vec2BertRunner
Wav2Vec2-BERT speech encoder runner.
Wav2Vec2BertRunnerBuilder
WeightFormatRegistration
Describes one on-disk weight format.
WeightMap
Map of tensor name → (f32 data, shape).
WeightMapSource
Adapt in-memory WeightMap to WeightSource.
WhisperConfig
Whisper model dimensions (HF field names; see OpenAI model.py comments in candle).
WhisperDecoderFlow
WhisperEncoderFlow
WhisperRunner
WhisperRunnerBuilder
WhisperWeightPrefix

Enums§

Arch
Detected embedding architecture from config.json.
ChatRole
Conversation role for ChatMessage.
DecodeMethod
Method used to decode CRNN sequence outputs.
DimOrder
Pixel layout for image tensors.
DinoV2Output
Forward output: classifier logits or token features.
DinoV2Variant
Which DINOv2 backbone size.
EmbedGgufKind
BERT vs NomicBERT discriminator from GGUF metadata.
EmbeddingModel
Supported text embedding models.
GemmaArch
GgufModelFamily
LM families in this workspace that load .gguf weights.
ImageEmbeddingModel
Supported image embedding models.
LoadedWeights
Result of resolving and opening weights.
MatWeight
Storage variant for matmul weight tensors. The big projections (qkv / gate / ffn / lm_head) dominate the load footprint; the Packed variant keeps GGUF K-quant bytes in-place so the graph can emit Op::DequantMatMul instead of a full F32 dequant.
ModelArch
Model architecture type.
Pooling
Pooling strategy for reducing token hidden states to one vector per sequence.
Precision
Precision policy for the Qwen3 inference graph. Today only F32 is exact; the others toggle the corresponding env-vars on the Metal MPSGraph fast path (see qwen3_metal_perf notes).
Qwen35LayerFfn
Per-layer feed-forward: dense SwiGLU or MoE (routed + gated shared expert).
Qwen35TrunkLayer
One trunk-layer tensor bundle. Either a gated-DeltaNet “linear attention” block or a standard full-attention block.
WeightDrainPolicy
How WeightMap::drain_loader / WeightMap::from_weight_loader handle leftovers.
WeightFormat
What the model file is. Used by runners to pick the right loader.

Constants§

BLACK_VALUE
Normalized greyscale background value used by ocrs models (matches ocrs 0.12.x).
DEFAULT_ALPHABET
Default character alphabet matching ocrs pretrained recognition models.
DEFAULT_N_CTX
Default context window (must match Python’s max_context = 2048).
DEFAULT_TEXT_ENCODER_LAYERS
Default hidden-state indices for FLUX.2 Klein (matches mflux).
HF_DETECTION_RTEN
Production detection checkpoint (legacy RTen graph).
HF_DETECTION_ST
Safetensors export of detection weights (short name).
HF_RECOGNITION_RTEN
Production recognition checkpoint (legacy RTen CRNN + GRU graph).
HF_RECOGNITION_ST
Safetensors export of recognition weights.
STANDARD_DEVICE_NAMES
CLI / help string for --device.
STOP_TOKEN
Stop sequence emitted by the model when generation is complete.

Traits§

FlowBuildExt
GgufTensorNameResolver
Resolve a builder-requested tensor name to the name stored in a GGUF file.
LmRunner
Minimal per-family runner interface.
ModelRunner
One CLI entry per model family. Each per-crate rlx-<family> binary calls its own run directly; the optional rlx-run multiplexer registers many ModelRunner implementations.
WeightLoader
Common interface every weight format must satisfy. Mirrors the existing WeightMap API so the safetensors impl is a one-line adapter.

Functions§

aggregate_offload_stats
Sum per-layer pool stats + optional CPU residency accounting from last forward.
apply_compile_profile
Apply tier-1 profile options to runtime compile options.
assemble_vision_hidden
Assemble encoder input [batch, seq, hidden] from NCHW pixels + preprocess weights.
assert_gguf_family
Open the file and ensure general.architecture matches expected.
build_bert_built
build_bert_graph
Build a BERT encoder IR graph from config and weights.
build_bert_graph_sized
build_dinov2_built
build_dinov2_graph_sized
Build the DINOv2 IR graph via native [ModelFlow].
build_flux2_cfg_combine_hir
HIR graph: neg + guidance_scale * (pos - neg) in f32.
build_flux2_forward_graph
build_flux2_forward_hir
Build the full denoiser forward graph in HIR.
build_flux2_minimal_graph
Lower minimal HIR to legacy Graph (MIR inner) for Session::compile.
build_flux2_minimal_hir
Build a compile-minimal HIR module: x_embedder(hidden)proj_out.
build_flux2_text_encoder_hir
build_gemma_decode_graph_sized
build_gemma_graph_sized
build_gemma_graph_sized_last_logits
build_gemma_graph_sized_packed
Packed K-quant prefill — not yet implemented; use unpacked weights or flow build.
build_graph
Build via flow and lower to MIR graph + params.
build_llada2_forward_graph
Full-sequence forward with custom block-diffusion mask.
build_llama32_decode_graph_sized
build_llama32_graph_sized
build_llama32_graph_sized_last_logits
build_llama32_graph_sized_packed
Packed-weights prefill graph — K-quant matmuls stay in the arena via Op::DequantMatMul (mirrors rlx_qwen3::build_qwen3_graph_sized_packed).
build_nomic_built
build_nomic_diagnostic_graph
Diagnostic builder — same as build_nomic_graph_sized but exposes intermediate tensors at every transformer-stage boundary as outputs. Returns (graph, params, checkpoint_names) where outputs[i] holds the tensor at checkpoint_names[i]. Used by examples/tests to bisect numerical issues (NaN/Inf) without instrumenting the executor.
build_nomic_graph_sized
Build a NomicBERT encoder IR graph.
build_nomic_vision_built
build_prompt
Build the GGUF prompt string from phonemized text + reference codec tokens.
build_qwen3_graph_sized
Build a Qwen3 causal-LM IR graph.
build_qwen3_prefill_built
build_qwen35_decode_graph
Single-token decode graph at prefix length past_seq.
build_qwen35_decode_hir_dynamic_ext
Decode HIR with symbolic past length (sym::PAST_SEQ) for dynamic compile cache.
build_qwen35_graph_sized
Build the Qwen3.5 forward IR.
build_qwen35_graph_sized_ext
Forward graph with optional runtime MRoPE inputs (rope_cos/rope_sin).
build_qwen35_graph_sized_stub
Legacy redirect — qwen35 forward is implemented via build_qwen35_graph_sized. Kept so older call sites get a clear message instead of a missing-symbol error.
build_qwen35_prefill_cache_graph
Prefill graph that seeds super::cache::Qwen35DecodeCache.
build_qwen35_prefill_cache_graph_ext
Prefill-cache graph with optional runtime MRoPE inputs (multimodal).
build_qwen35_prefill_cache_hir_dynamic_ext
Prefill-cache HIR with symbolic seq dim (sym::SEQ) for dynamic compile cache.
build_sam2_image_encoder_built
build_sam2_image_encoder_graph
Lowered graph wrapper for legacy callers (via super::flow::Sam2ImageEncoderFlow).
build_sam3_detector_decoder_built
build_sam3_detector_encoder_built
build_sam3_detector_encoder_graph
Lower encoder HIR to legacy Graph (via super::flow::Sam3DetectorEncoderFlow).
build_sam_encoder_built
build_sam_encoder_graph
Lowered graph wrapper for legacy callers (via super::flow::SamEncoderFlow).
build_vision_graph_sized
Build a NomicVision encoder IR graph via native [ModelFlow].
build_vjepa2_encoder_graph_sized
Build the V-JEPA2 encoder IR graph from extracted weights (via super::flow::Vjepa2EncoderFlow).
build_wav2vec2_bert_built
build_wav2vec2_bert_graph_sized
Build a Wav2Vec2-BERT encoder IR graph for concrete batch × seq.
build_whisper_decode_step_built
build_whisper_decoder_built
build_whisper_decoder_graph_sized
build_whisper_decoder_prefill_built
build_whisper_encoder_built
build_whisper_encoder_graph_sized
built_from_graph
built_from_hir
built_from_hir_with_profile
cfg_combine
Native CFG blend in float32.
compile_built
Compile a BuiltModel on the given device using its embedded profile.
compile_built_cpu
Compile a BuiltModel on CPU with default options (embedding quick-check tests).
compile_flux2_cfg_combine
compile_flux2_forward
compile_flux2_forward_via_flow
Compile denoiser via tier-0 Flux2Flow wrapper (same numerics as super::hir_builder::compile_flux2_forward).
compile_flux2_minimal
Compile minimal HIR on CPU (HIR → MIR → LIR).
compile_flux2_text_encoder_hir
compile_graph_encoder
Bidirectional encoder defaults (BERT, DINOv2, Wav2Vec2, vision towers).
compile_graph_encoder_with_params
CompileProfile::encoder + params.
compile_graph_legacy
Unprofiled compile (parity probes / bisect tests).
compile_graph_llama32_decode
Llama 3.2 decode graphs.
compile_graph_llama32_prefill
Llama 3.2 prefill graphs.
compile_graph_profile
Lower a graph with a tier-1 profile and attach params (tests / examples).
compile_graph_qwen3_decode
Qwen3 single-token decode graphs.
compile_graph_qwen3_prefill
Qwen3 prefill / full-sequence graphs.
compile_graph_qwen3_prefill_with_params
CompileProfile::qwen3_prefill + params.
compile_graph_qwen35_decode
Qwen3.5 decode-step graphs.
compile_graph_qwen35_decode_with_params
CompileProfile::qwen35_decode + params.
compile_graph_qwen35_prefill
Qwen3.5 prefill-cache / predict graphs.
compile_graph_qwen35_prefill_with_params
CompileProfile::qwen35_prefill + params.
compile_graph_sam
Compile a SAM/SAM2/SAM3 vision subgraph with tier-1 encoder profile options.
compile_graph_sam_with_params
CompileProfile::sam_encoder + params.
compile_graph_with_profile
Compile a vision subgraph with explicit tier-1 profile options.
compile_model
Compile an embedding graph for the given batch/seq on device.
conv3d_patch_embed
3-D conv patch embedding: input [C, T, H, W] → tokens [seq, embed_dim].
debug_resolve_name
decode_step_feeds
Build host feeds for a single decode step from cache.
default_mel_frames
Default 30 s chunk mel width.
default_memory_budget_bytes
VRAM / unified-memory budget hint for MoE offload sizing.
detect_arch
Detect architecture from config.json fields.
dispatch
dispatch_help
download_flux2_repo
embed_with_rlx
Embed texts with a compiled BERT model: tokenize, forward, pool, L2-normalize.
encode_chat_auto
Resolve tokenizer next to weights and encode a chat conversation.
encode_flux2_prompt
End-to-end: tokenize (optional) + text encoder → embeddings + text ids.
encode_prompt_embeds_default_layers
Encode with default Klein layer indices (9, 18, 27).
encode_prompt_padded
Encode and pad/truncate to fixed seq_len (pad token id 0).
encode_video_native
Encode a pre-normalized video tensor [C, T, H, W].
extract_encoder_weights
extract_flux2_vae_weights
extract_flux2_weights
extract_ids
Extract all speech token IDs from a generated string.
extract_model_weights
extract_patch_embed_weights
extract_pooler_weights
extract_predictor_weights
extract_text_encoder_weights
flux2_decode_packed_latents
Full post-denoise decode: packed transformer latents → 8-bit RGB planar [batch, 3, H, W].
flux2_prefers_compiled_hir
True when the denoiser / VAE should use compiled HIR (non-CPU backends).
flux2_prefers_compiled_te
Text encoder HIR on CUDA compiles a full Qwen3 trunk and can take hours + fill VRAM while the denoiser is still resident. Native CPU encode once, then drop TE.
flux2_rgb_to_u8
Planar RGB [-1,1] → interleaved u8 HWC for PNG.
flux2_transformer_forward
Run the FLUX.2 transformer and return noise prediction [batch, img_seq, patch_size² * out_channels].
format_chatml
Format messages as a ChatML prompt ending with an open assistant turn.
format_for_extension
Extension → format id (last registration wins).
forward_decoder_ir_on
IR-compiled detector decoder on the requested device (6 layer graphs).
gemma_cfg_from_gguf
gemma_encode_prompt
Encode text to token ids using a HuggingFace tokenizer file.
gemma_encode_prompt_auto
Encode with an optional explicit path; falls back to GGUF embedded vocab via encode_prompt_from_gguf when no tokenizer.json is found.
gemma_resolve_tokenizer_path
Resolve a tokenizer path: explicit --tokenizer, sibling of the GGUF weights, or tokenizer.json in the weights directory.
gguf_architecture_str
general.architecture string from GGUF metadata, if present.
gguf_dir_guide
Numbered .gguf listing + resolve hints for a directory (CLI / errors).
gguf_f32_bytes_estimate
Rough F32 dequant footprint (every tensor × 4 bytes).
gguf_family_for_arch
Map a GGUF architecture tag to the runner family that should load it.
gguf_runner_hint
Suggested runner / crate for a GGUF architecture tag (for CLI and errors).
graph_from_built
Build a flow and return (Graph, params) — preferred compile entry point.
graph_from_hir
Lower an existing HIR module through BuiltModel (utility for HIR-first builders).
host_temb
Host-side temb for compiled forward (timestep × 1000, optional guidance × 1000).
into_compile_parts
Split built flow for compile — no Graph/HIR imports needed at call site.
is_standard_device
True when device is in STANDARD_DEVICES.
list_mtp_keys
list_registered_formats
All registered formats (built-ins first, then custom registrations).
llama32_cfg_from_gguf
llama32_encode_prompt
Encode text to token ids using a HuggingFace tokenizer file.
llama32_encode_prompt_auto
Encode with an optional explicit path; falls back to GGUF embedded vocab via encode_prompt_from_gguf when no tokenizer.json is found.
llama32_resolve_tokenizer_path
Resolve a tokenizer path: explicit --tokenizer, sibling of the GGUF weights, or tokenizer.json in the weights directory.
load_and_apply_flux2_lora
Load LoRA from safetensors and merge into base.
load_compile_profile
Load a tier-1 profile from disk; fall back to default when missing or invalid.
load_flux2_vae_weights
load_flux2_weights
load_from_path
Dispatch on the file extension via crate::weight_registry.
load_rgb_planar
Load an image, resize to (width, height), return planar NCHW f32 in [-1, 1].
load_text_encoder_weights
load_wav_mono_f32
load_weight_map_resolved
Convenience: resolve + drain to F32 WeightMap.
load_weights_resolved
Resolve a file or directory, enforce GGUF arch policy, open via registry, optionally drain.
messages_from_prompt
Convenience: system (optional) + user prompt → ChatML messages.
models_map
Get the global model registry.
mrope_prefill_feeds
Flattened [seq * head_half] cos/sin for runtime MRoPE graph inputs.
mrope_row_for_sections
Build one MRoPE cos/sin row from explicit per-section positions.
mrope_slice_at_pos
Slice MRoPE cos/sin at absolute text position pos (shape [half] each).
mtp_draft_vocab_size
MTP LM head output width: full vocab, or trimmed for FastMTP draft speed.
normalize_video_hwc
Normalize RGB u8 frames to NCTHW f32 in [0,1] then ImageNet stats. frames is [num_frames, crop, crop, 3] HWC u8 row-major.
open_gguf_loader
GGUF loader with optional MTP-head visibility (LM families).
open_loader
open_loader_resolved
Resolve a file or weights directory, then open the right loader.
open_loader_with_format
open_map
Resolve + drain to F32 WeightMap.
open_map_with
Resolve + drain with options.
open_weights
Resolve + open (live WeightLoader).
open_with
Resolve + open with options.
pack_input_ids
Pack per-row prompts into [batch, max_seq] row-major F32 ids (zero-pad).
parse_lora_scale
Parse --lora-scale style input; rejects NaN/inf.
parse_messages_json
Parse a JSON array of { "role": "...", "content": "..." }.
pcm_to_mel
pool_native
Pool encoder tokens [batch, seq, hidden][batch, hidden] embedding.
predict_native
Run the predictor on encoder outputs [batch, seq, enc_dim] flat.
prepare_latent_ids
FLUX.2 latent position ids [batch, h*w, 4] with (t=0, h, w, l=0).
prepare_text_ids
Build FLUX.2-style text position ids [batch, seq, 4] flattened as [seq*4].
prepare_weight_map
Full load-time adaptation pipeline.
profile_near_weights
Load profile_file next to weights (parent directory); fall back to default.
recurrent_output_count
Number of extra graph outputs after logits (and optional MTP).
refresh_experts
Whether to call moe_infer_with_expert_refresh on this forward.
register_gguf_tensor_resolver
Register a custom resolver (call before first GGUF load). Later registrations win among resolvers that match the same architecture.
register_runner
register_weight_format
Register a custom weight format (call before the first load). Later entries override built-ins when the same extension is registered twice.
registered_runners
resolve_model_dir
Resolve detection + recognition weight paths under dir (safetensors for native RLX).
resolve_text_encoder_dir
Resolve text_encoder/ next to a transformer weights file or model root.
resolve_tokenizer_path
Resolve tokenizer path: explicit, tokenizer/tokenizer.json, or sibling tokenizer.json.
resolve_transformer_config
Resolve transformer config.json from explicit override or sibling search.
resolve_vae_dir
Resolve vae/ next to a transformer weights file or model root.
resolve_weights_file
Resolve --weights to a single file: pass-through for files, or pick one .gguf / model.safetensors inside a directory.
resolve_weights_file_with_options
Resolve with optional GGUF file selection inside a directory.
run_registered
sam2_apply_fpn_neck
Run the FPN neck. stage_outputs[i] is the encoder’s stage-i output flattened from BHWC [1, h, w, dim] to [h·w·dim]. stage_dims[i] = dim, stage_hw[i] = (h, w) — pulled straight from the graph’s stage-output shapes (or computed from cfg.embed_dim_at_stage(s) / cfg.grid_size_at_stage(s)).
sam2_apply_fpn_neck_host
Host-only lateral convs (legacy entry point).
sam2_assemble_patch_tokens
Run Hiera’s patch embedding (Conv2d k=7 s=4 p=3) on the host, then add the stage-0 position embedding. Output is [grid, grid, E] BHWC (the layout Hiera operates on internally), flattened.
sam2_mask_decoder_forward
Run the SAM 2 mask decoder.
sam2_memory_attention_forward
Memory attention forward.
sam2_memory_encoder_forward
Run the SAM 2 memory encoder.
sam2_preprocess_image
Square-resize an RGB u8 image to 1024×1024 (bilinear, no aspect- ratio preservation), /255, then ImageNet-normalise. Returns a contiguous [3, 1024, 1024] NCHW f32 buffer.
sam2_prompt_encoder_forward
Run the SAM 2 prompt encoder. Mirrors sam2.modeling.sam.prompt_encoder.PromptEncoder.forward.
sam2_two_way_transformer_forward
Top-level two-way transformer forward.
sam3_assemble_patch_tokens
sam3_preprocess_image
Resize an RGB u8 image to fit in SAM3’s square canvas, normalize, and pad.
sam_apply_neck_host
Run the encoder neck on the host. body_out is the encoder body’s output reshaped to [hw·hw, embed_dim] (BHWC flattened). Returns [out_chans, hw, hw] NCHW image embeddings.
sam_assemble_patch_tokens
Run the patch embedding (Conv2d k=16 s=16 no padding) on the host and add the absolute positional embedding. Output is [1, hw, hw, E] BHWC (SAM’s internal convention) flattened to a contiguous f32 buffer for the encoder graph.
sam_preprocess_image
Resize an RGB u8 image to fit within SAM_IMG_SIZE on the long side (aspect-ratio preserved), normalize with SAM’s pixel stats, and zero-pad to a square [3, 1024, 1024] NCHW f32 tensor.
sample_token
Sample one token id from a [vocab] logits slice. Returns the chosen index. Stateless w.r.t. prior calls — the RNG is seeded per-call from opts.seed so repeated calls with the same seed and logits yield the same token.
seed_cache_from_outputs
Parse prefill-cache graph outputs into logits/hidden + Qwen35DecodeCache. When trunk_is_hidden, the first output is [batch × hidden_size] not logits.
supports_multimodal_mrope
True when the checkpoint declares a non-zero 4th MRoPE section (vision).
text_section_pos
Text-modality default: [p, p, p, 0] per llama.cpp token batches.
tiny_text_encoder_config
Qwen3Config sized for super::weights::synthetic_text_encoder_weights tests.
validate_device
Validate that device is in the workspace standard backend set (CPU, Metal, MLX, CUDA, ROCm, WGPU, Vulkan). Build with all-backends on rlx-qwen35 to link every native runtime backend into the rlx-qwen35 binary.
validate_llada2_device
Supported execution devices (standard RLX backends).
validate_sam_device
SAM v1 also documents tpu on [rlx_sam::Sam::from_safetensors_on].
validate_standard_device
Fail fast on exotic runtime devices (TPU, ANE, OpenGL, …).
zero_recurrent_inputs
Zero-initialized recurrent inputs for a prefill-cache seed graph.

Type Aliases§

Flux2GraphParams
Param tensors keyed by name for rlx_runtime::CompiledGraph::set_param.
LoadOpts
Alias for LoadWeightsOptions.
PassThroughGgufResolver
Alias for PrefixStripGgufResolver (older name).
ResolveOpts
Alias for ResolveWeightsOptions.
WhisperKvCache
Incremental self-attention cache.