pub struct GgufLoader { /* private fields */ }Implementations§
Source§impl GgufLoader
impl GgufLoader
pub fn from_file(path: &str) -> Result<GgufLoader, Error>
pub fn architecture(&self) -> &str
Sourcepub fn mtp_layer_threshold(&self) -> Option<u32>
pub fn mtp_layer_threshold(&self) -> Option<u32>
First blk.N index that the GGUF metadata reports as an MTP
head, derived from {arch}.block_count - {arch}.nextn_predict_layers. None for files where the
nextn_predict_layers key is absent (= no MTP, or MTP is
encoded under a different naming scheme — fall back to
is_mtp_weight in that case).
Sourcepub fn file(&self) -> &GgufFile
pub fn file(&self) -> &GgufFile
Borrow the underlying parsed GgufFile so callers (e.g. arch
builders that read general.architecture-specific keys)
don’t have to re-parse 800+ tensor headers a second time.
Sourcepub fn tensor_bytes_borrowed(&self, key: &str) -> Option<&[u8]>
pub fn tensor_bytes_borrowed(&self, key: &str) -> Option<&[u8]>
Borrow the raw on-disk byte slice for a tensor without
marking it taken. Returns None if the key doesn’t resolve
or the byte range is invalid. Used by the qwen35 packed-
upload path to stream K-quant bytes from mmap straight into
the compiled arena, skipping a per-tensor Vec<u8>
allocation (≈ 16 GB on Qwen3.6-27B Q4_K_M).
Sourcepub fn take_packed_metadata(
&mut self,
key: &str,
) -> Result<Option<(QuantScheme, Vec<usize>)>, Error>
pub fn take_packed_metadata( &mut self, key: &str, ) -> Result<Option<(QuantScheme, Vec<usize>)>, Error>
Variant of Self::take_packed that returns only the
(scheme, shape) metadata without copying bytes. The caller
uploads bytes separately via Self::tensor_bytes_borrowed
after the graph is compiled — eliminates the per-tensor
Vec<u8> allocation. Marks the tensor taken on success;
returns Ok(None) for non-K-quant dtypes so the caller can
fall back to the dequant path.
Sourcepub fn is_mtp_tensor(&self, name: &str) -> bool
pub fn is_mtp_tensor(&self, name: &str) -> bool
True if name is an MTP weight under this file’s naming
scheme. Combines the substring heuristic (is_mtp_weight)
with the model-aware blk.N where N >= threshold check.
Sourcepub fn include_mtp(&mut self, include: bool) -> &mut GgufLoader
pub fn include_mtp(&mut self, include: bool) -> &mut GgufLoader
Toggle MTP-weight visibility. With include = true, MTP
heads show up in remaining_keys() (and count toward len())
— drain-style consumers like
Qwen3Generator::from_loader will then pull them into the
weights cache. Default off so non-MTP models behave exactly
as before. Call this before any take() / drain so the
inclusion choice is consistent across the load.
Sourcepub fn take_packed(
&mut self,
key: &str,
) -> Result<Option<(Vec<u8>, QuantScheme, Vec<usize>)>, Error>
pub fn take_packed( &mut self, key: &str, ) -> Result<Option<(Vec<u8>, QuantScheme, Vec<usize>)>, Error>
Take a tensor’s packed bytes (no dequant), plus its
rlx_ir::quant::QuantScheme and safetensors-style shape.
Returns None when the tensor is stored uncompressed
(F32/F16/BF16) — caller should fall back to take() for
those.
Used by the qwen3 builder’s packed-weights mode: the LM
head + per-layer matmul weights stay in the arena as raw
K-quant bytes, and the graph emits
Op::DequantMatMul { scheme } instead of Op::MatMul for
them. Cuts the load-time memory footprint by ~7-9× on
Q4_K_M / Q6_K models — the unblocker for ≥14 B Qwen3 / Llama
GGUFs on commodity Macs.
Sourcepub fn take_mtp(&mut self, key: &str) -> Result<(Vec<f32>, Vec<usize>), Error>
pub fn take_mtp(&mut self, key: &str) -> Result<(Vec<f32>, Vec<usize>), Error>
Take a single MTP weight by name. Bypasses the include_mtp
filter so callers can grab specific heads without flipping
the global visibility. Returns an error if the name isn’t a
recognized MTP weight (use [take] for non-MTP keys).
Source§impl GgufLoader
impl GgufLoader
Sourcepub fn mtp_keys(&self) -> Vec<String>
pub fn mtp_keys(&self) -> Vec<String>
Tensor names that look like MTP heads under this file’s
scheme (combines the substring heuristic with the
model-aware blk.N where N >= threshold check — see
is_mtp_tensor).
Returned unfiltered by remaining_keys so consumers wanting
to wire MTP can find them explicitly.
Trait Implementations§
Source§impl WeightLoader for GgufLoader
impl WeightLoader for GgufLoader
Source§fn take_transposed(
&mut self,
key: &str,
) -> Result<(Vec<f32>, Vec<usize>), Error>
fn take_transposed( &mut self, key: &str, ) -> Result<(Vec<f32>, Vec<usize>), Error>
BREAKING CHANGE in 0.2.0: prior to 0.2.0 this method was
a no-op for GGUF (returned the bytes unchanged with the GGUF
shape label) which silently produced garbage logits when the
builder expected [in, out] row-major. From 0.2.0 onwards
take normalizes GGUF’s reverse-shape convention so this
method matches the safetensors variant byte-for-byte.
Downstream code that explicitly worked around the old buggy
behavior (manually re-transposing the result) must drop that
workaround.
Source§fn arch_hint(&self) -> Option<&str>
fn arch_hint(&self) -> Option<&str>
general.architecture
for GGUF, None for safetensors). Drain-style consumers use this
to pick an arch-specific reverse name mapping when the canonical
HF name depends on the model family (e.g. Gemma 2’s 4 norms per
layer don’t share the Llama 2-norm reverse alias).Source§fn take_packed(
&mut self,
key: &str,
) -> Result<Option<(Vec<u8>, QuantScheme, Vec<usize>)>, Error>
fn take_packed( &mut self, key: &str, ) -> Result<Option<(Vec<u8>, QuantScheme, Vec<usize>)>, Error>
None.Source§fn tensor_bytes_borrowed(&self, key: &str) -> Option<&[u8]>
fn tensor_bytes_borrowed(&self, key: &str) -> Option<&[u8]>
Source§fn take(&mut self, key: &str) -> Result<(Vec<f32>, Vec<usize>), Error>
fn take(&mut self, key: &str) -> Result<(Vec<f32>, Vec<usize>), Error>
(f32_data, shape). Removes from the
loader so callers can detect “weights I never used.”Source§fn remaining_keys(&self) -> Vec<String>
fn remaining_keys(&self) -> Vec<String>
fn is_empty(&self) -> bool
Auto Trait Implementations§
impl Freeze for GgufLoader
impl RefUnwindSafe for GgufLoader
impl Send for GgufLoader
impl Sync for GgufLoader
impl Unpin for GgufLoader
impl UnsafeUnpin for GgufLoader
impl UnwindSafe for GgufLoader
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more