Crate rlx_models

Expand description

RLX model loading — parse configs, load weights, build IR graphs.

This crate is a thin facade over per-model workspace members (rlx-qwen3, rlx-sam, …). Depend on a specific model crate directly when you only need one family.

Re-exports§

pub use run::ConfigSource;
pub use run::SamArch;
pub use run::SamPredictionAny;
pub use run::SamRunner;
pub use run::SamRunnerBuilder;

Modules§

arch_registry: Architecture registry (plan #82).
bert
bert_flow
bonsai
cohere
config: Model configuration structs — parsed from HuggingFace config.json.
dataprocessing: Reusable batch-prep utilities (plan #83).
diamond
dinov2
embed
flow_bridge: Bridge between rlx-models loaders/runtime and rlx-flow.
flow_util: Shared helpers for tier-0 model flow migration.
flux2
gemma
gguf_resolve: Pluggable GGUF tensor-name resolution per general.architecture.
gguf_support: Shared GGUF helpers for LM runners (architecture checks, path resolution).
granite
llada2
llama32
lm: Shared causal-LM flow helpers — re-export tier-0 surface for model authors.
mask_hyper_matmul_ir
mask_prompt_ir
mistral
mlp_relu_ir
neutts
nomic
nomic_flow
ocr
ocrsDeprecated
omnicoder
phi
qwen3
qwen35
qwen35_synth: Synthetic configs/weights for tests and qwen35_inference bench. Synthetic Qwen3.5 weights for integration tests and criterion benches.
run: High-level runner API — re-exported from per-model crates.
sam
sam2
sam3
tide
twoway_transformer_ir
vision
vision_flow
vision_ops_ir: Shared HIR builders for NCHW vision ops (Conv, ConvTranspose2d, LayerNorm2d, bias broadcast). Used by SAM / SAM2 / SAM3.
vjepa2
wav2vec2_bert
weight_loader: Pluggable weight loader trait (plan #56).
weight_map: Safetensors weight loading — standalone, no framework dependency.
weight_registry: Extensible weight-format registry — register custom loaders for new extensions.
weights: Model-agnostic weight I/O — paths, formats, drain policy only.
whisper

Structs§

BackboneModel: NeuTTS backbone — RLX Llama-3.2 runner over a llama-tagged GGUF.
BertConfig: BERT model configuration.
BertFlow
BertTokenizer: Wrapper around HuggingFace tokenizer configured for BERT-style encoding.
BlockDenoiseConfig: Block diffusion generation options (TIDE generate defaults).
BlockDenoiseLoop: Driver for TIDE-style block masked diffusion (host-side token state).
BuiltModel: Result of assembling a model flow.
ChatMessage: One turn in a ChatML conversation.
CompileProfile: Tier-1 compile configuration. Load from *.rlx.toml or use Rust presets.
DetectionParams: Post-processing parameters for the text detection segmentation mask.
DinoV2Built
DinoV2Config: DINOv2 model configuration. vit_giant (SwiGLU MLP) is not yet supported — vit_small / vit_base / vit_large are.
DinoV2Flow
DinoV2PreprocessWeights: Preprocess weights extracted from the safetensors checkpoint.
DinoV2Runner: Resolved DINOv2 runner.
DinoV2RunnerBuilder: Builder for DinoV2Runner. Mirrors the qwen3 / sam shape.
Flux2CfgCombineFlow: Tier-0 CFG combine: neg + scale * (pos - neg).
Flux2CfgCombineGraph
Flux2Checkpoint: Resolved on-disk layout for a FLUX.2 HF repo (diffusers-style tree).
Flux2Config: FLUX.2 rectified-flow transformer (denoiser) configuration.
Flux2Flow: Tier-0 FLUX.2 dual-stream flow builder.
Flux2ForwardBuilt: Full forward build product (includes non-f32 typed param blobs).
Flux2ForwardGraph
Flux2ForwardInput: Inputs for one transformer forward (noise prediction).
Flux2Output: Noise prediction from Flux2Runner::forward.
Flux2PromptOutput
Flux2Runner: FLUX.2 denoiser runner — native CPU or compiled HIR on any Device.
Flux2RunnerBuilder: Builder for Flux2Runner.
Flux2Session: One loaded FLUX.2 pipeline — cheap to clone via Arc.
Flux2SessionCache: Process-wide cache of Flux2Runner instances (CLI --reuse-session / serve mode).
Flux2SessionKey: Cache key for deduplicating loaded runners.
Flux2TextEncoderBuilt
Flux2TextEncoderFlow: Tier-0 FLUX.2 text encoder flow (Qwen3-shaped causal LM trunk).
Flux2VaeConfig
Flux2VaeDecoderFlow: Tier-0 FLUX.2 VAE decoder flow.
Flux2VaeEncoderFlow: Tier-0 FLUX.2 VAE encoder flow.
Flux2VaeGraph
Flux2VaeWeights
Flux2Weights
GemmaConfig
GemmaFlow: Fluent Gemma flow builder — reads config once, chain modifiers, then build.
GemmaGenerator: Stateful Gemma generation handle.
GenerateConfig: Generation options matching PyTorch LLaDA2MoeModelLM.generate.
GenerationConfig: Generation hyper-parameters for the GGUF backbone.
GgufDirGuide
ImageSource: Input image for crate::OcrEngine::prepare_input.
LLaDA2MoeConfig
LLaDA2Runner
LLaDA2RunnerBuilder
LLaDA2Weights
Llama32Config
Llama32Flow: Fluent LLaMA-3.2 flow builder — reads config once, chain modifiers, then build.
Llama32Generator: Stateful LLaMA-3.2 generation handle.
Llama32Runner
Llama32RunnerBuilder
LlamaFamilyGgufResolver: HF model.layers.N.* ↔ GGUF blk.N.* (Llama, Qwen3, Qwen35, …).
LoadWeightsOptions: Options for load_weights_resolved — prefer crate::weights::LoadOpts presets at call sites.
LogMelExtractor
LogMelFeatures
MelSpectrogram
ModelInfo: Metadata for an embedding model.
NeuCodecDecoder: NeuCodec decoder: converts speech token IDs to a 24 kHz audio waveform.
NeuCodecEncoder: NeuCodec encoder: converts a 16 kHz audio waveform to speech token IDs.
NeuTTS: NeuTTS handle: GGUF backbone (optional) + NeuCodec decoder.
NomicBertConfig: NomicBERT model configuration.
NomicFlow
NomicVisionBuilt
NomicVisionConfig: NomicVision model configuration.
NomicVisionFlow
OcrConfig: Shared OCR settings.
OcrEngine: End-to-end OCR pipeline (ocrs-compatible API).
OcrEngineParams: Parameters for constructing an OcrEngine.
OcrInput: Preprocessed greyscale input image [1, H, W].
OcrOutput: Structured OCR output.
OcrRunner: OCR session wrapping a fully loaded OcrEngine.
OcrRunnerBuilder: Builder for OcrRunner (mirrors whisper / dinov2 runners).
PredictiveOffloadInfo: Return value of TIDE enable_predictive_expert_offload (JSON-serializable keys).
PredictiveOffloadParams: Arguments matching TIDE LLaDA2MoeModelLM.enable_predictive_expert_offload.
Qwen3Config
Qwen3Flow
Qwen3Generator: Stateful Qwen3 generation handle.
Qwen3PrefillOpts
Qwen3Runner: Resolved Qwen3 runner — call Qwen3Runner::generate for streaming decode (F32 path), or Qwen3Runner::predict_logits for a single forward pass (works in both F32 and packed modes).
Qwen3RunnerBuilder: Builder for Qwen3Runner. See the module docs for usage.
Qwen3Speculator: Speculator adapter wrapping a Qwen3Generator. Each propose/verify call resets the wrapped generator’s internal state and re-seeds the KV cache from context — necessary because the Speculator trait gives no acceptance feedback, so the speculator cannot incrementally advance its cache to track the SpecDecoder’s chosen tokens.
Qwen35Config: Qwen3.5 model config — fields covering both the per-layer Mamba+ Attention block and the MTP head.
Qwen35FullAttnLayer: Standard full-attention trunk layer (interspersed every full_attention_interval blocks). Per qwen35.cpp::load_block_trunk non-recurrent branch.
Qwen35LinearLayer: Gated DeltaNet (“linear attention”) trunk layer. Mirrors qwen35.cpp::load_block_trunk for the is_recurrent(il) branch.
Qwen35MoeFfn: MoE FFN tensors for one decoder layer (trunk or MTP).
Qwen35MtpLayer: One MTP (NextN) layer. Per qwen35.cpp::load_block_mtp.
Qwen35NativeGgufResolver: Qwen3.5 native blk.N.* names; also accept HF aliases via the Llama mapper.
Qwen35PrefillOutput
Qwen35Runner
Qwen35RunnerBuilder
Qwen35Weights: Top-level Qwen3.5 / Qwen3.6 weight bundle.
RegisteredFormat: One registered on-disk format (built-in or custom).
ResolveWeightsOptions: Options for resolve_weights_file_with_options.
RlxBertModel: RLX-compiled BERT model ready for inference.
RlxEmbed: High-level embedding model — auto-detects BERT / NomicBERT / NomicVision.
RlxNomicModel: RLX-compiled NomicBERT with shape-bucketed compile cache.
RlxVisionModel: RLX-compiled NomicVision encoder (patch preprocess host-side, trunk on RLX).
RotatedRect: An oriented rectangle.
Sam2: Full SAM 2 model — owns the compiled image encoder + every host-side weight bundle. The encoder result is recomputed per call (no encoder-caching here; layer above can wrap if needed).
Sam3
Sam2Config: Top-level SAM 2 configuration — Hiera + FPN + decoder + memory (encoder + attention) for the video path. Mirrors SAM2Base in the reference.
Sam2DecoderConfig: Mask decoder configuration. Field names + defaults mirror sam2/modeling/sam/mask_decoder.py::MaskDecoder.__init__ and the published sam2_hiera_*.yaml model.sam_mask_decoder_extra_args.
Sam2FpnConfig: FPN neck configuration. Mirrors FpnNeck in the reference.
Sam2FpnLevel: A single FPN level output — BCHW features + matched sinusoidal positional encoding.
Sam2FpnNeckWeights: Weights for the FPN neck — one 1×1 conv (weight + bias) per backbone level. Stored coarse → fine to match the checkpoint’s image_encoder.neck.convs.{i}.conv.{weight,bias} ordering.
Sam2HieraConfig: Hiera image-encoder configuration — Tiny, Small, Base+ or Large.
Sam2ImageEncoderBuilt
Sam2ImageEncoderFlow
Sam2ImagePrediction: One frame’s worth of mask-decoder output, as returned by both Sam2::predict_image and Sam2::predict_video_frame.
Sam2MaskDecoderOutput: Output of mask_decoder_forward.
Sam2MaskDecoderWeights
Sam2MemoryAttentionWeights
Sam2MemoryConfig: Memory-attention configuration (video path).
Sam2MemoryEncoderConfig: Memory-encoder configuration. Mirrors sam2/modeling/memory_encoder.py::MemoryEncoder + its MaskDownSampler and Fuser. Defaults match every published sam2_hiera_*.yaml memory_encoder: block.
Sam2MemoryEncoderOutput
Sam2MemoryEncoderWeights
Sam2PreprocessWeights: Weights extracted from the safetensors checkpoint that the host uses before the encoder graph runs.
Sam2PromptEncoderOutput: Output of prompt_encoder_forward — fed straight into the mask decoder. All host-side Vec<f32>.
Sam2PromptEncoderWeights: All weights consumed by prompt_encoder_forward. Loaded once from the safetensors file and then reused per prompt.
Sam2TwoWayTransformerWeights
Sam2VideoState: Per-track state for Sam2::predict_video_frame. Stores up to max_obj_ptrs_in_encoder past memory tokens + the rolling object-pointer queue.
Sam3CompiledDecoder: Compile-once-per-layer decoder, runnable across many frames.
Sam3Config
Sam3DetectorConfig
Sam3DetectorDecoderBuilt: Compile-once SAM3 detector decoder (six per-layer graphs + host glue).
Sam3DetectorDecoderFlow
Sam3DetectorEncoderFlow
Sam3EncodedImage
Sam3ImagePrediction
Sam3PreprocessWeights
Sam3TextConfig
Sam3TrackerConfig
Sam3VideoFramePrediction
Sam3VideoState
Sam3VitConfig
SamConfig: Top-level SAM configuration — encoder + decoder + a few constants shared between them.
SamEncoderBuilt
SamEncoderConfig: Encoder configuration — ViT-B/L/H or TinyViT variants.
SamEncoderFlow
SamNeckWeights: Weights for the four neck layers, kept on the host because rlx-ir doesn’t have f32 forward Conv2d (and 3×3 padding=1 doesn’t reduce to matmul).
SamPreprocessWeights: Weights extracted from the safetensors checkpoint that the host uses before the encoder graph runs.
SampleOpts: Sampling configuration. Construct via SampleOpts::greedy / SampleOpts::temperature or build manually.
TextChar: A single recognized character with its axis-aligned bounding box.
TextLine: A line of text composed of words.
TextWord: A word composed of one or more characters.
TideOffloadStats: Cumulative counters aligned with TIDE LLaDA2MoeSparseMoeBlock.offload_stats.
TideRunner: TIDE reference model runner (LLaDA2 MoE + block diffusion + predictive offload).
TokenizedBatch: Output of batch tokenization: token IDs, attention masks, and token type IDs.
VisionPreprocessWeights: Preprocessing weights extracted from safetensors for the caller to assemble the “hidden” input before graph execution.
Vjepa2Config
Vjepa2EncoderBuilt
Vjepa2EncoderFlow
Vjepa2EncoderOutput
Vjepa2EncoderWeights
Vjepa2Masks: Context / target patch indices for one batch element.
Vjepa2ModelWeights
Vjepa2Output: Encoder token output from Vjepa2Runner::encode_video.
Vjepa2PatchEmbedWeights
Vjepa2PoolOutput: Attentive pooler output (+ optional classifier logits).
Vjepa2PoolerFlow
Vjepa2PoolerWeights
Vjepa2PredictOutput: Predictor output (projected target tokens).
Vjepa2PredictorFlow
Vjepa2PredictorWeights
Vjepa2Runner: V-JEPA2 runner — encoder (+ optional predictor / pooler).
Vjepa2RunnerBuilder
Wav2Vec2BertConfig: Wav2Vec2-BERT model configuration (e.g. facebook/w2v-bert-2.0).
Wav2Vec2BertFlow
Wav2Vec2BertPreprocessConfig
Wav2Vec2BertRunner: Wav2Vec2-BERT speech encoder runner.
Wav2Vec2BertRunnerBuilder
WeightFormatRegistration: Describes one on-disk weight format.
WeightMap: Map of tensor name → (f32 data, shape).
WeightMapSource: Adapt in-memory WeightMap to WeightSource.
WhisperConfig: Whisper model dimensions (HF field names; see OpenAI model.py comments in candle).
WhisperDecoderFlow
WhisperEncoderFlow
WhisperRunner
WhisperRunnerBuilder
WhisperWeightPrefix

Enums§

Arch: Detected embedding architecture from config.json.
ChatRole: Conversation role for ChatMessage.
DecodeMethod: Method used to decode CRNN sequence outputs.
DimOrder: Pixel layout for image tensors.
DinoV2Output: Forward output: classifier logits or token features.
DinoV2Variant: Which DINOv2 backbone size.
EmbedGgufKind: BERT vs NomicBERT discriminator from GGUF metadata.
EmbeddingModel: Supported text embedding models.
GemmaArch
GgufModelFamily: LM families in this workspace that load .gguf weights.
ImageEmbeddingModel: Supported image embedding models.
LoadedWeights: Result of resolving and opening weights.
MatWeight: Storage variant for matmul weight tensors. The big projections (qkv / gate / ffn / lm_head) dominate the load footprint; the Packed variant keeps GGUF K-quant bytes in-place so the graph can emit Op::DequantMatMul instead of a full F32 dequant.
ModelArch: Model architecture type.
Pooling: Pooling strategy for reducing token hidden states to one vector per sequence.
Precision: Precision policy for the Qwen3 inference graph. Today only F32 is exact; the others toggle the corresponding env-vars on the Metal MPSGraph fast path (see qwen3_metal_perf notes).
Qwen35LayerFfn: Per-layer feed-forward: dense SwiGLU or MoE (routed + gated shared expert).
Qwen35TrunkLayer: One trunk-layer tensor bundle. Either a gated-DeltaNet “linear attention” block or a standard full-attention block.
WeightDrainPolicy: How WeightMap::drain_loader / WeightMap::from_weight_loader handle leftovers.
WeightFormat: What the model file is. Used by runners to pick the right loader.

Constants§

BLACK_VALUE: Normalized greyscale background value used by ocrs models (matches ocrs 0.12.x).
DEFAULT_ALPHABET: Default character alphabet matching ocrs pretrained recognition models.
DEFAULT_N_CTX: Default context window (must match Python’s max_context = 2048).
DEFAULT_TEXT_ENCODER_LAYERS: Default hidden-state indices for FLUX.2 Klein (matches mflux).
HF_DETECTION_RTEN: Production detection checkpoint (legacy RTen graph).
HF_DETECTION_ST: Safetensors export of detection weights (short name).
HF_RECOGNITION_RTEN: Production recognition checkpoint (legacy RTen CRNN + GRU graph).
HF_RECOGNITION_ST: Safetensors export of recognition weights.
STANDARD_DEVICE_NAMES: CLI / help string for --device.
STOP_TOKEN: Stop sequence emitted by the model when generation is complete.

Traits§

FlowBuildExt
GgufTensorNameResolver: Resolve a builder-requested tensor name to the name stored in a GGUF file.
LmRunner: Minimal per-family runner interface.
ModelRunner: One CLI entry per model family. Each per-crate rlx-<family> binary calls its own run directly; the optional rlx-run multiplexer registers many ModelRunner implementations.
WeightLoader: Common interface every weight format must satisfy. Mirrors the existing WeightMap API so the safetensors impl is a one-line adapter.

Functions§

aggregate_offload_stats: Sum per-layer pool stats + optional CPU residency accounting from last forward.
apply_compile_profile: Apply tier-1 profile options to runtime compile options.
assemble_vision_hidden: Assemble encoder input [batch, seq, hidden] from NCHW pixels + preprocess weights.
assert_gguf_family: Open the file and ensure general.architecture matches expected.
build_bert_built
build_bert_graph: Build a BERT encoder IR graph from config and weights.
build_bert_graph_sized
build_dinov2_built
build_dinov2_graph_sized: Build the DINOv2 IR graph via native [ModelFlow].
build_flux2_cfg_combine_hir: HIR graph: neg + guidance_scale * (pos - neg) in f32.
build_flux2_forward_graph
build_flux2_forward_hir: Build the full denoiser forward graph in HIR.
build_flux2_minimal_graph: Lower minimal HIR to legacy Graph (MIR inner) for Session::compile.
build_flux2_minimal_hir: Build a compile-minimal HIR module: x_embedder(hidden) → proj_out.
build_flux2_text_encoder_hir
build_gemma_decode_graph_sized
build_gemma_graph_sized
build_gemma_graph_sized_last_logits
build_gemma_graph_sized_packed: Packed K-quant prefill — not yet implemented; use unpacked weights or flow build.
build_graph: Build via flow and lower to MIR graph + params.
build_llada2_forward_graph: Full-sequence forward with custom block-diffusion mask.
build_llama32_decode_graph_sized
build_llama32_graph_sized
build_llama32_graph_sized_last_logits
build_llama32_graph_sized_packed: Packed-weights prefill graph — K-quant matmuls stay in the arena via Op::DequantMatMul (mirrors rlx_qwen3::build_qwen3_graph_sized_packed).
build_nomic_built
build_nomic_diagnostic_graph: Diagnostic builder — same as build_nomic_graph_sized but exposes intermediate tensors at every transformer-stage boundary as outputs. Returns (graph, params, checkpoint_names) where outputs[i] holds the tensor at checkpoint_names[i]. Used by examples/tests to bisect numerical issues (NaN/Inf) without instrumenting the executor.
build_nomic_graph_sized: Build a NomicBERT encoder IR graph.
build_nomic_vision_built
build_prompt: Build the GGUF prompt string from phonemized text + reference codec tokens.
build_qwen3_graph_sized: Build a Qwen3 causal-LM IR graph.
build_qwen3_prefill_built
build_qwen35_decode_graph: Single-token decode graph at prefix length past_seq.
build_qwen35_decode_hir_dynamic_ext: Decode HIR with symbolic past length (sym::PAST_SEQ) for dynamic compile cache.
build_qwen35_graph_sized: Build the Qwen3.5 forward IR.
build_qwen35_graph_sized_ext: Forward graph with optional runtime MRoPE inputs (rope_cos/rope_sin).
build_qwen35_graph_sized_stub: Legacy redirect — qwen35 forward is implemented via build_qwen35_graph_sized. Kept so older call sites get a clear message instead of a missing-symbol error.
build_qwen35_prefill_cache_graph: Prefill graph that seeds super::cache::Qwen35DecodeCache.
build_qwen35_prefill_cache_graph_ext: Prefill-cache graph with optional runtime MRoPE inputs (multimodal).
build_qwen35_prefill_cache_hir_dynamic_ext: Prefill-cache HIR with symbolic seq dim (sym::SEQ) for dynamic compile cache.
build_sam2_image_encoder_built
build_sam2_image_encoder_graph: Lowered graph wrapper for legacy callers (via super::flow::Sam2ImageEncoderFlow).
build_sam3_detector_decoder_built
build_sam3_detector_encoder_built
build_sam3_detector_encoder_graph: Lower encoder HIR to legacy Graph (via super::flow::Sam3DetectorEncoderFlow).
build_sam_encoder_built
build_sam_encoder_graph: Lowered graph wrapper for legacy callers (via super::flow::SamEncoderFlow).
build_vision_graph_sized: Build a NomicVision encoder IR graph via native [ModelFlow].
build_vjepa2_encoder_graph_sized: Build the V-JEPA2 encoder IR graph from extracted weights (via super::flow::Vjepa2EncoderFlow).
build_wav2vec2_bert_built
build_wav2vec2_bert_graph_sized: Build a Wav2Vec2-BERT encoder IR graph for concrete batch × seq.
build_whisper_decode_step_built
build_whisper_decoder_built
build_whisper_decoder_graph_sized
build_whisper_decoder_prefill_built
build_whisper_encoder_built
build_whisper_encoder_graph_sized
built_from_graph
built_from_hir
built_from_hir_with_profile
cfg_combine: Native CFG blend in float32.
compile_built: Compile a BuiltModel on the given device using its embedded profile.
compile_built_cpu: Compile a BuiltModel on CPU with default options (embedding quick-check tests).
compile_flux2_cfg_combine
compile_flux2_forward
compile_flux2_forward_via_flow: Compile denoiser via tier-0 Flux2Flow wrapper (same numerics as super::hir_builder::compile_flux2_forward).
compile_flux2_minimal: Compile minimal HIR on CPU (HIR → MIR → LIR).
compile_flux2_text_encoder_hir
compile_graph_encoder: Bidirectional encoder defaults (BERT, DINOv2, Wav2Vec2, vision towers).
compile_graph_encoder_with_params: CompileProfile::encoder + params.
compile_graph_legacy: Unprofiled compile (parity probes / bisect tests).
compile_graph_llama32_decode: Llama 3.2 decode graphs.
compile_graph_llama32_prefill: Llama 3.2 prefill graphs.
compile_graph_profile: Lower a graph with a tier-1 profile and attach params (tests / examples).
compile_graph_qwen3_decode: Qwen3 single-token decode graphs.
compile_graph_qwen3_prefill: Qwen3 prefill / full-sequence graphs.
compile_graph_qwen3_prefill_with_params: CompileProfile::qwen3_prefill + params.
compile_graph_qwen35_decode: Qwen3.5 decode-step graphs.
compile_graph_qwen35_decode_with_params: CompileProfile::qwen35_decode + params.
compile_graph_qwen35_prefill: Qwen3.5 prefill-cache / predict graphs.
compile_graph_qwen35_prefill_with_params: CompileProfile::qwen35_prefill + params.
compile_graph_sam: Compile a SAM/SAM2/SAM3 vision subgraph with tier-1 encoder profile options.
compile_graph_sam_with_params: CompileProfile::sam_encoder + params.
compile_graph_with_profile: Compile a vision subgraph with explicit tier-1 profile options.
compile_model: Compile an embedding graph for the given batch/seq on device.
conv3d_patch_embed: 3-D conv patch embedding: input [C, T, H, W] → tokens [seq, embed_dim].
debug_resolve_name
decode_step_feeds: Build host feeds for a single decode step from cache.
default_mel_frames: Default 30 s chunk mel width.
default_memory_budget_bytes: VRAM / unified-memory budget hint for MoE offload sizing.
detect_arch: Detect architecture from config.json fields.
dispatch
dispatch_help
download_flux2_repo
embed_with_rlx: Embed texts with a compiled BERT model: tokenize, forward, pool, L2-normalize.
encode_chat_auto: Resolve tokenizer next to weights and encode a chat conversation.
encode_flux2_prompt: End-to-end: tokenize (optional) + text encoder → embeddings + text ids.
encode_prompt_embeds_default_layers: Encode with default Klein layer indices (9, 18, 27).
encode_prompt_padded: Encode and pad/truncate to fixed seq_len (pad token id 0).
encode_video_native: Encode a pre-normalized video tensor [C, T, H, W].
extract_encoder_weights
extract_flux2_vae_weights
extract_flux2_weights
extract_ids: Extract all speech token IDs from a generated string.
extract_model_weights
extract_patch_embed_weights
extract_pooler_weights
extract_predictor_weights
extract_text_encoder_weights
flux2_decode_packed_latents: Full post-denoise decode: packed transformer latents → 8-bit RGB planar [batch, 3, H, W].
flux2_prefers_compiled_hir: True when the denoiser / VAE should use compiled HIR (non-CPU backends).
flux2_prefers_compiled_te: Text encoder HIR on CUDA compiles a full Qwen3 trunk and can take hours + fill VRAM while the denoiser is still resident. Native CPU encode once, then drop TE.
flux2_rgb_to_u8: Planar RGB [-1,1] → interleaved u8 HWC for PNG.
flux2_transformer_forward: Run the FLUX.2 transformer and return noise prediction [batch, img_seq, patch_size² * out_channels].
format_chatml: Format messages as a ChatML prompt ending with an open assistant turn.
format_for_extension: Extension → format id (last registration wins).
forward_decoder_ir_on: IR-compiled detector decoder on the requested device (6 layer graphs).
gemma_cfg_from_gguf
gemma_encode_prompt: Encode text to token ids using a HuggingFace tokenizer file.
gemma_encode_prompt_auto: Encode with an optional explicit path; falls back to GGUF embedded vocab via encode_prompt_from_gguf when no tokenizer.json is found.
gemma_resolve_tokenizer_path: Resolve a tokenizer path: explicit --tokenizer, sibling of the GGUF weights, or tokenizer.json in the weights directory.
gguf_architecture_str: general.architecture string from GGUF metadata, if present.
gguf_dir_guide: Numbered .gguf listing + resolve hints for a directory (CLI / errors).
gguf_f32_bytes_estimate: Rough F32 dequant footprint (every tensor × 4 bytes).
gguf_family_for_arch: Map a GGUF architecture tag to the runner family that should load it.
gguf_runner_hint: Suggested runner / crate for a GGUF architecture tag (for CLI and errors).
graph_from_built: Build a flow and return (Graph, params) — preferred compile entry point.
graph_from_hir: Lower an existing HIR module through BuiltModel (utility for HIR-first builders).
host_temb: Host-side temb for compiled forward (timestep × 1000, optional guidance × 1000).
into_compile_parts: Split built flow for compile — no Graph/HIR imports needed at call site.
is_standard_device: True when device is in STANDARD_DEVICES.
list_mtp_keys
list_registered_formats: All registered formats (built-ins first, then custom registrations).
llama32_cfg_from_gguf
llama32_encode_prompt: Encode text to token ids using a HuggingFace tokenizer file.
llama32_encode_prompt_auto: Encode with an optional explicit path; falls back to GGUF embedded vocab via encode_prompt_from_gguf when no tokenizer.json is found.
llama32_resolve_tokenizer_path: Resolve a tokenizer path: explicit --tokenizer, sibling of the GGUF weights, or tokenizer.json in the weights directory.
load_and_apply_flux2_lora: Load LoRA from safetensors and merge into base.
load_compile_profile: Load a tier-1 profile from disk; fall back to default when missing or invalid.
load_flux2_vae_weights
load_flux2_weights
load_from_path: Dispatch on the file extension via crate::weight_registry.
load_rgb_planar: Load an image, resize to (width, height), return planar NCHW f32 in [-1, 1].
load_text_encoder_weights
load_wav_mono_f32
load_weight_map_resolved: Convenience: resolve + drain to F32 WeightMap.
load_weights_resolved: Resolve a file or directory, enforce GGUF arch policy, open via registry, optionally drain.
messages_from_prompt: Convenience: system (optional) + user prompt → ChatML messages.
models_map: Get the global model registry.
mrope_prefill_feeds: Flattened [seq * head_half] cos/sin for runtime MRoPE graph inputs.
mrope_row_for_sections: Build one MRoPE cos/sin row from explicit per-section positions.
mrope_slice_at_pos: Slice MRoPE cos/sin at absolute text position pos (shape [half] each).
mtp_draft_vocab_size: MTP LM head output width: full vocab, or trimmed for FastMTP draft speed.
normalize_video_hwc: Normalize RGB u8 frames to NCTHW f32 in [0,1] then ImageNet stats. frames is [num_frames, crop, crop, 3] HWC u8 row-major.
open_gguf_loader: GGUF loader with optional MTP-head visibility (LM families).
open_loader
open_loader_resolved: Resolve a file or weights directory, then open the right loader.
open_loader_with_format
open_map: Resolve + drain to F32 WeightMap.
open_map_with: Resolve + drain with options.
open_weights: Resolve + open (live WeightLoader).
open_with: Resolve + open with options.
pack_input_ids: Pack per-row prompts into [batch, max_seq] row-major F32 ids (zero-pad).
parse_lora_scale: Parse --lora-scale style input; rejects NaN/inf.
parse_messages_json: Parse a JSON array of { "role": "...", "content": "..." }.
pcm_to_mel
pool_native: Pool encoder tokens [batch, seq, hidden] → [batch, hidden] embedding.
predict_native: Run the predictor on encoder outputs [batch, seq, enc_dim] flat.
prepare_latent_ids: FLUX.2 latent position ids [batch, h*w, 4] with (t=0, h, w, l=0).
prepare_text_ids: Build FLUX.2-style text position ids [batch, seq, 4] flattened as [seq*4].
prepare_weight_map: Full load-time adaptation pipeline.
profile_near_weights: Load profile_file next to weights (parent directory); fall back to default.
recurrent_output_count: Number of extra graph outputs after logits (and optional MTP).
refresh_experts: Whether to call moe_infer_with_expert_refresh on this forward.
register_gguf_tensor_resolver: Register a custom resolver (call before first GGUF load). Later registrations win among resolvers that match the same architecture.
register_runner
register_weight_format: Register a custom weight format (call before the first load). Later entries override built-ins when the same extension is registered twice.
registered_runners
resolve_model_dir: Resolve detection + recognition weight paths under dir (safetensors for native RLX).
resolve_text_encoder_dir: Resolve text_encoder/ next to a transformer weights file or model root.
resolve_tokenizer_path: Resolve tokenizer path: explicit, tokenizer/tokenizer.json, or sibling tokenizer.json.
resolve_transformer_config: Resolve transformer config.json from explicit override or sibling search.
resolve_vae_dir: Resolve vae/ next to a transformer weights file or model root.
resolve_weights_file: Resolve --weights to a single file: pass-through for files, or pick one .gguf / model.safetensors inside a directory.
resolve_weights_file_with_options: Resolve with optional GGUF file selection inside a directory.
run_registered
sam2_apply_fpn_neck: Run the FPN neck. stage_outputs[i] is the encoder’s stage-i output flattened from BHWC [1, h, w, dim] to [h·w·dim]. stage_dims[i] = dim, stage_hw[i] = (h, w) — pulled straight from the graph’s stage-output shapes (or computed from cfg.embed_dim_at_stage(s) / cfg.grid_size_at_stage(s)).
sam2_apply_fpn_neck_host: Host-only lateral convs (legacy entry point).
sam2_assemble_patch_tokens: Run Hiera’s patch embedding (Conv2d k=7 s=4 p=3) on the host, then add the stage-0 position embedding. Output is [grid, grid, E] BHWC (the layout Hiera operates on internally), flattened.
sam2_mask_decoder_forward: Run the SAM 2 mask decoder.
sam2_memory_attention_forward: Memory attention forward.
sam2_memory_encoder_forward: Run the SAM 2 memory encoder.
sam2_preprocess_image: Square-resize an RGB u8 image to 1024×1024 (bilinear, no aspect- ratio preservation), /255, then ImageNet-normalise. Returns a contiguous [3, 1024, 1024] NCHW f32 buffer.
sam2_prompt_encoder_forward: Run the SAM 2 prompt encoder. Mirrors sam2.modeling.sam.prompt_encoder.PromptEncoder.forward.
sam2_two_way_transformer_forward: Top-level two-way transformer forward.
sam3_assemble_patch_tokens
sam3_preprocess_image: Resize an RGB u8 image to fit in SAM3’s square canvas, normalize, and pad.
sam_apply_neck_host: Run the encoder neck on the host. body_out is the encoder body’s output reshaped to [hw·hw, embed_dim] (BHWC flattened). Returns [out_chans, hw, hw] NCHW image embeddings.
sam_assemble_patch_tokens: Run the patch embedding (Conv2d k=16 s=16 no padding) on the host and add the absolute positional embedding. Output is [1, hw, hw, E] BHWC (SAM’s internal convention) flattened to a contiguous f32 buffer for the encoder graph.
sam_preprocess_image: Resize an RGB u8 image to fit within SAM_IMG_SIZE on the long side (aspect-ratio preserved), normalize with SAM’s pixel stats, and zero-pad to a square [3, 1024, 1024] NCHW f32 tensor.
sample_token: Sample one token id from a [vocab] logits slice. Returns the chosen index. Stateless w.r.t. prior calls — the RNG is seeded per-call from opts.seed so repeated calls with the same seed and logits yield the same token.
seed_cache_from_outputs: Parse prefill-cache graph outputs into logits/hidden + Qwen35DecodeCache. When trunk_is_hidden, the first output is [batch × hidden_size] not logits.
supports_multimodal_mrope: True when the checkpoint declares a non-zero 4th MRoPE section (vision).
text_section_pos: Text-modality default: [p, p, p, 0] per llama.cpp token batches.
tiny_text_encoder_config: Qwen3Config sized for super::weights::synthetic_text_encoder_weights tests.
validate_device: Validate that device is in the workspace standard backend set (CPU, Metal, MLX, CUDA, ROCm, WGPU, Vulkan). Build with all-backends on rlx-qwen35 to link every native runtime backend into the rlx-qwen35 binary.
validate_llada2_device: Supported execution devices (standard RLX backends).
validate_sam_device: SAM v1 also documents tpu on [rlx_sam::Sam::from_safetensors_on].
validate_standard_device: Fail fast on exotic runtime devices (TPU, ANE, OpenGL, …).
zero_recurrent_inputs: Zero-initialized recurrent inputs for a prefill-cache seed graph.

Type Aliases§

Flux2GraphParams: Param tensors keyed by name for rlx_runtime::CompiledGraph::set_param.
LoadOpts: Alias for LoadWeightsOptions.
PassThroughGgufResolver: Alias for PrefixStripGgufResolver (older name).
ResolveOpts: Alias for ResolveWeightsOptions.
WhisperKvCache: Incremental self-attention cache.

Crate rlx_models

Crate rlx_models Copy item path

Re-exports§

Modules§

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Crate rlx_models