pub struct GgufMatMul { /* private fields */ }Implementations§
Source§impl GgufMatMul
impl GgufMatMul
Trait Implementations§
Source§impl Debug for GgufMatMul
impl Debug for GgufMatMul
Source§impl QuantMethod for GgufMatMul
impl QuantMethod for GgufMatMul
Source§fn gather_forward_raw(&self, x: &Tensor, indices: &Tensor) -> Result<Tensor>
fn gather_forward_raw(&self, x: &Tensor, indices: &Tensor) -> Result<Tensor>
Compute matmul of self and a. self should contain the weights.
If a is (n_tokens, 1, cols), self weights are (n_experts, rows, cols),
then the indices are (n_tokens, n_experts_per_tok).
fn new(method: QuantMethodConfig) -> Result<Self>where
Self: Sized,
fn dequantize_w(&self) -> Result<Tensor>
Source§fn forward_raw(&self, a: &Tensor) -> Result<Tensor>
fn forward_raw(&self, a: &Tensor) -> Result<Tensor>
Raw matmul without dtype casting. Implementors override this.
Callers should use
forward instead.Source§fn quantized_act_type(&self) -> Option<DType>
fn quantized_act_type(&self) -> Option<DType>
If a quantized method, return the activation dtype.
fn has_bias(&self) -> bool
Source§fn add_delta_w(&self, delta: &Tensor) -> Result<Arc<dyn QuantMethod>>
fn add_delta_w(&self, delta: &Tensor) -> Result<Arc<dyn QuantMethod>>
Add a delta weight from LoRA to the weights. This should be prescaled with alpha.
Source§fn dtype_and_device(&self) -> (DType, Device)
fn dtype_and_device(&self) -> (DType, Device)
Weight dtype and device
Source§fn apply_isq(
self: Arc<Self>,
dtype: Option<IsqType>,
device: Device,
n_quantized: &AtomicUsize,
imatrix_weight: Option<Vec<f32>>,
guard: QuantizeOntoGuard,
) -> Result<Arc<dyn QuantMethod>>
fn apply_isq( self: Arc<Self>, dtype: Option<IsqType>, device: Device, n_quantized: &AtomicUsize, imatrix_weight: Option<Vec<f32>>, guard: QuantizeOntoGuard, ) -> Result<Arc<dyn QuantMethod>>
If the quant is backed by a qmatmul.
Source§fn forward(&self, a: &Tensor) -> Result<Tensor>
fn forward(&self, a: &Tensor) -> Result<Tensor>
Compute matmul of
self and a. self should contain the weights.
Automatically casts to the required quantization activation type and back.Source§fn gather_forward(&self, a: &Tensor, indices: &Tensor) -> Result<Tensor>
fn gather_forward(&self, a: &Tensor, indices: &Tensor) -> Result<Tensor>
Compute gather matmul of
self and a. self should contain the weights.
Automatically casts to the required quantization activation type and back. Read moreSource§fn afq_inner(&self) -> Option<AfqInner<'_>>
fn afq_inner(&self) -> Option<AfqInner<'_>>
If this is an AFQ layer, return its (w_q, scales, biases, bits, group_size).
Used by Metal fused QKV / gate-up paths.
fn unquant_weight_bias(&self) -> Option<(Tensor, Option<Tensor>)>
Source§fn begin_track_stats(&mut self) -> Result<()>
fn begin_track_stats(&mut self) -> Result<()>
Begin tracking stats into an ImatrixLayerStats
Source§fn end_track_stats(&self) -> Result<Tensor>
fn end_track_stats(&self) -> Result<Tensor>
End tracking stats into an ImatrixLayerStats. Returns the computed imatrix.
fn is_distributed(&self) -> Option<DistributedKind>
fn dummy_info(&self) -> Option<&DummyLayerInfo>
Source§impl QuantizedSerde for GgufMatMul
impl QuantizedSerde for GgufMatMul
fn isq_serde_supported(&self) -> bool
fn name(&self) -> &'static str
fn serialize(&self) -> Result<Cow<'_, [u8]>>
Source§fn serialize_with_bias(&self, bias: Option<Tensor>) -> Result<Cow<'_, [u8]>>
fn serialize_with_bias(&self, bias: Option<Tensor>) -> Result<Cow<'_, [u8]>>
NOT meant for external calling
fn deserialize( data: Cow<'_, [u8]>, device: &Device, _comm: &Arc<Comm>, guard: QuantizeOntoGuard, ) -> Result<Arc<dyn QuantMethod>>
fn deserialize_ext_bias( data: Cow<'_, [u8]>, device: &Device, guard: QuantizeOntoGuard, ) -> Result<(Arc<dyn QuantMethod>, Option<Tensor>)>
Auto Trait Implementations§
impl Freeze for GgufMatMul
impl !RefUnwindSafe for GgufMatMul
impl Send for GgufMatMul
impl Sync for GgufMatMul
impl Unpin for GgufMatMul
impl UnsafeUnpin for GgufMatMul
impl !UnwindSafe for GgufMatMul
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more