pub struct QuantLinear<B: Backend> { /* private fields */ }Expand description
Linear projection backed by a GGUF k-quant weight kept quantised in backend memory.
forward calls into Backend::gemm_quant, which dequants the
weight into a transient fp16 buffer (Metal) or pre-dequanted fp32
weights (CPU) and then runs the matmul. See B::QuantStore per
backend for the storage format details.
Future k-quant flavours (Q5_K, Q6_K, Q8_0) plug in via the
GgufQuantType discriminator passed to the constructor — no new
QuantLinear type required.
Implementations§
Source§impl<B: Backend> QuantLinear<B>
impl<B: Backend> QuantLinear<B>
Sourcepub fn from_gguf_bytes(
kind: GgufQuantType,
bytes: &[u8],
out_features: usize,
in_features: usize,
) -> Result<Self>
pub fn from_gguf_bytes( kind: GgufQuantType, bytes: &[u8], out_features: usize, in_features: usize, ) -> Result<Self>
Build from raw GGUF block bytes.
kind: which k-quant flavour the bytes encode (Q4_K, Q5_K, …).
bytes: the on-disk payload, sized by the kind’s block layout.
Sourcepub fn from_gguf_fused(
parts: &[(GgufQuantType, &[u8], usize)],
in_features: usize,
) -> Result<Self>
pub fn from_gguf_fused( parts: &[(GgufQuantType, &[u8], usize)], in_features: usize, ) -> Result<Self>
Build a fused projection from multiple (kind, bytes, rows)
parts that share in_features. Each part stays in its own
QuantStore (no byte-concat); forward dispatches one matvec per
part. Used for Qwen3 qkv_proj when q+k are Q4_K and v is Q6_K
— the homogeneous fused-Q4 fast path would have to fall back
to eager-fp32, blowing 100 MB per layer.
Sourcepub fn from_store(
store: B::QuantStore,
out_features: usize,
in_features: usize,
) -> Self
pub fn from_store( store: B::QuantStore, out_features: usize, in_features: usize, ) -> Self
For tests / advanced callers that have already constructed a
B::QuantStore (e.g. through the Backend’s own ingestion path).
pub fn store(&self) -> &B::QuantStore
Trait Implementations§
Source§impl<B: Backend> Linear<B> for QuantLinear<B>
impl<B: Backend> Linear<B> for QuantLinear<B>
Auto Trait Implementations§
impl<B> Freeze for QuantLinear<B>
impl<B> RefUnwindSafe for QuantLinear<B>
impl<B> Send for QuantLinear<B>
impl<B> Sync for QuantLinear<B>
impl<B> Unpin for QuantLinear<B>
impl<B> UnsafeUnpin for QuantLinear<B>
impl<B> UnwindSafe for QuantLinear<B>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more