Struct GgufLoader

Source

pub struct GgufLoader { /* private fields */ }

Implementations§

Source §

pub fn mtp_layer_threshold(&self) -> Option<u32>

First blk.N index that the GGUF metadata reports as an MTP head, derived from {arch}.block_count - {arch}.nextn_predict_layers. None for files where the nextn_predict_layers key is absent (= no MTP, or MTP is encoded under a different naming scheme — fall back to is_mtp_weight in that case).

Source

pub fn file(&self) -> &GgufFile

Borrow the underlying parsed GgufFile so callers (e.g. arch builders that read general.architecture-specific keys) don’t have to re-parse 800+ tensor headers a second time.

Source

pub fn tensor_bytes_borrowed(&self, key: &str) -> Option<&[u8]>

Borrow the raw on-disk byte slice for a tensor without marking it taken. Returns None if the key doesn’t resolve or the byte range is invalid. Used by the qwen35 packed- upload path to stream K-quant bytes from mmap straight into the compiled arena, skipping a per-tensor Vec<u8> allocation (≈ 16 GB on Qwen3.6-27B Q4_K_M).

Source

pub fn take_packed_metadata( &mut self, key: &str, ) -> Result<Option<(QuantScheme, Vec<usize>)>, Error>

Variant of Self::take_packed that returns only the (scheme, shape) metadata without copying bytes. The caller uploads bytes separately via Self::tensor_bytes_borrowed after the graph is compiled — eliminates the per-tensor Vec<u8> allocation. Marks the tensor taken on success; returns Ok(None) for non-K-quant dtypes so the caller can fall back to the dequant path.

Source

pub fn is_mtp_tensor(&self, name: &str) -> bool

True if name is an MTP weight under this file’s naming scheme. Combines the substring heuristic (is_mtp_weight) with the model-aware blk.N where N >= threshold check.

Source

pub fn include_mtp(&mut self, include: bool) -> &mut GgufLoader

Toggle MTP-weight visibility. With include = true, MTP heads show up in remaining_keys() (and count toward len()) — drain-style consumers like Qwen3Generator::from_loader will then pull them into the weights cache. Default off so non-MTP models behave exactly as before. Call this before any take() / drain so the inclusion choice is consistent across the load.

Source

pub fn take_packed( &mut self, key: &str, ) -> Result<Option<(Vec<u8>, QuantScheme, Vec<usize>)>, Error>

Take a tensor’s packed bytes (no dequant), plus its rlx_ir::quant::QuantScheme and safetensors-style shape. Returns None when the tensor is stored uncompressed (F32/F16/BF16) — caller should fall back to take() for those.

Used by the qwen3 builder’s packed-weights mode: the LM head + per-layer matmul weights stay in the arena as raw K-quant bytes, and the graph emits Op::DequantMatMul { scheme } instead of Op::MatMul for them. Cuts the load-time memory footprint by ~7-9× on Q4_K_M / Q6_K models — the unblocker for ≥14 B Qwen3 / Llama GGUFs on commodity Macs.

Source

pub fn take_mtp(&mut self, key: &str) -> Result<(Vec<f32>, Vec<usize>), Error>

Take a single MTP weight by name. Bypasses the include_mtp filter so callers can grab specific heads without flipping the global visibility. Returns an error if the name isn’t a recognized MTP weight (use [take] for non-MTP keys).

Source §

impl GgufLoader

Source

pub fn mtp_keys(&self) -> Vec<String>

Tensor names that look like MTP heads under this file’s scheme (combines the substring heuristic with the model-aware blk.N where N >= threshold check — see is_mtp_tensor). Returned unfiltered by remaining_keys so consumers wanting to wire MTP can find them explicitly.

Trait Implementations§

Source §

impl WeightLoader for GgufLoader

Source §

fn take_transposed( &mut self, key: &str, ) -> Result<(Vec<f32>, Vec<usize>), Error>

BREAKING CHANGE in 0.2.0: prior to 0.2.0 this method was a no-op for GGUF (returned the bytes unchanged with the GGUF shape label) which silently produced garbage logits when the builder expected [in, out] row-major. From 0.2.0 onwards take normalizes GGUF’s reverse-shape convention so this method matches the safetensors variant byte-for-byte. Downstream code that explicitly worked around the old buggy behavior (manually re-transposing the result) must drop that workaround.

Source §

fn format_id(&self) -> &'static str

Format id (safetensors, gguf, or a custom registration).

Source §

fn arch_hint(&self) -> Option<&str>

Architecture name from the underlying file (general.architecture for GGUF, None for safetensors). Drain-style consumers use this to pick an arch-specific reverse name mapping when the canonical HF name depends on the model family (e.g. Gemma 2’s 4 norms per layer don’t share the Llama 2-norm reverse alias).

Source §

fn take_packed( &mut self, key: &str, ) -> Result<Option<(Vec<u8>, QuantScheme, Vec<usize>)>, Error>

Take packed K-quant bytes when supported; default returns None.

Source §

fn tensor_bytes_borrowed(&self, key: &str) -> Option<&[u8]>

Borrow packed bytes without marking taken (GGUF mmap path).

Source §

fn len(&self) -> usize

Number of distinct weights in the file.

Source §

fn take(&mut self, key: &str) -> Result<(Vec<f32>, Vec<usize>), Error>

Take the named tensor as (f32_data, shape). Removes from the loader so callers can detect “weights I never used.”

Source §

fn remaining_keys(&self) -> Vec<String>

Names that haven’t been taken yet — useful for “did the model use every weight?” hygiene checks.

Source §

fn is_empty(&self) -> bool

Auto Trait Implementations§

§

impl Freeze for GgufLoader

§

impl RefUnwindSafe for GgufLoader

§

impl Send for GgufLoader

§

impl Sync for GgufLoader

§

impl Unpin for GgufLoader

§

impl UnsafeUnpin for GgufLoader

§

impl UnwindSafe for GgufLoader

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §