pub struct NativeSafetensorsLoader<B: Backend + BackendQuantMarlin> { /* private fields */ }Expand description
Native safetensors loader. Generic over Backend so every tensor is
materialised directly into backend-native buffers.
Implementations§
Source§impl<B: Backend + BackendQuantMarlin> NativeSafetensorsLoader<B>
impl<B: Backend + BackendQuantMarlin> NativeSafetensorsLoader<B>
Sourcepub fn open(model_dir: impl AsRef<Path>) -> Result<Self>
pub fn open(model_dir: impl AsRef<Path>) -> Result<Self>
Discover shards under model_dir and build the name → shard index.
Sourcepub fn read_gptq_raw(
&self,
name: &str,
) -> Result<(Vec<i32>, Vec<f32>, Vec<i32>, Option<Vec<i32>>, usize, usize)>
pub fn read_gptq_raw( &self, name: &str, ) -> Result<(Vec<i32>, Vec<f32>, Vec<i32>, Option<Vec<i32>>, usize, usize)>
Read the four raw GPTQ tensors for a named projection without
triggering a Backend repack. Used by MoE batch loading: callers
stack many experts host-side then issue a single B::load_gptq,
avoiding the 12 288× per-expert Marlin repack overhead.
Returns (qweight, scales, qzeros, g_idx, k, n).
g_idx is None when desc_act=false (no act-order perm needed).
pub fn quant_config_ref(&self) -> Option<&QuantConfig>
Sourcepub fn load_stacked_gptq_experts(
&self,
expert_prefix_fmt: &str,
num_experts: usize,
proj_names: &[&str],
) -> Result<(Arc<dyn MarlinExpertStack<B>>, usize, usize)>
pub fn load_stacked_gptq_experts( &self, expert_prefix_fmt: &str, num_experts: usize, proj_names: &[&str], ) -> Result<(Arc<dyn MarlinExpertStack<B>>, usize, usize)>
Load a STACKED GPTQ tile that concatenates num_experts experts’
raw GPTQ tensors along the N (column) axis and runs ONE backend
repack — instead of num_experts × proj_names.len() repacks.
Layout: per row r, the cols are emitted in expert-major order:
expert_0[proj_0|proj_1|...] | expert_1[...] | ... | expert_{N-1}[...].
Caller can therefore index expert e at column offset
e * n_per_expert, where n_per_expert = Σ n(proj) across the
proj_names for one expert.
expert_prefix_fmt should be a closure-style &str that contains
"{e}" placeholder (replaced by the expert index) and ends just
before the proj name — e.g. "model.layers.5.mlp.experts.{e}.".
The full tensor name probed is {expert_prefix}{proj}.
Returns (store, n_per_expert, k) where n_per_expert is the
per-expert column width and k = in_features (shared by all).
Trait Implementations§
Source§impl<B: Backend + BackendQuantMarlin> WeightLoader<B> for NativeSafetensorsLoader<B>
impl<B: Backend + BackendQuantMarlin> WeightLoader<B> for NativeSafetensorsLoader<B>
Source§fn load_tensor(&self, name: &str) -> Result<B::Buffer>
fn load_tensor(&self, name: &str) -> Result<B::Buffer>
"model.embed_tokens.weight").Source§fn load_linear(&self, name: &str) -> Result<Box<dyn Linear<B>>>
fn load_linear(&self, name: &str) -> Result<Box<dyn Linear<B>>>
Linear<B>. The concrete implementation
(DenseLinear / GptqLinear / AwqLinear / GgufLinear) depends on the
loader’s file format and quant config. Read moreSource§fn has_tensor(&self, name: &str) -> bool
fn has_tensor(&self, name: &str) -> bool
Source§fn quant_config(&self) -> Option<&QuantConfig>
fn quant_config(&self) -> Option<&QuantConfig>
quantize_config.json or a GGUF header).
None means the source is dense.Auto Trait Implementations§
impl<B> Freeze for NativeSafetensorsLoader<B>
impl<B> RefUnwindSafe for NativeSafetensorsLoader<B>where
B: RefUnwindSafe,
impl<B> Send for NativeSafetensorsLoader<B>
impl<B> Sync for NativeSafetensorsLoader<B>
impl<B> Unpin for NativeSafetensorsLoader<B>where
B: Unpin,
impl<B> UnsafeUnpin for NativeSafetensorsLoader<B>
impl<B> UnwindSafe for NativeSafetensorsLoader<B>where
B: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<T> ErasedDestructor for Twhere
T: 'static,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more