pub struct PerTensorFP8Linear { /* private fields */ }Expand description
Per-tensor FP8 Linear layer with static activation scaling.
This is used for models that have per-tensor FP8 quantization (weight_block_size = null) with static activation scales. Each linear layer has:
<layer>.weight(FP8 E4M3)<layer>.weight_scale_inv(F32 scalar) - dequantization scale for weights<layer>.activation_scale(F32 scalar) - quantization scale for activations
Trait Implementations§
Source§impl Debug for PerTensorFP8Linear
impl Debug for PerTensorFP8Linear
Source§impl QuantMethod for PerTensorFP8Linear
impl QuantMethod for PerTensorFP8Linear
fn new(method: QuantMethodConfig) -> Result<Self>where
Self: Sized,
fn dequantize_w(&self) -> Result<Tensor>
Source§fn forward(&self, x: &Tensor) -> Result<Tensor>
fn forward(&self, x: &Tensor) -> Result<Tensor>
Compute matmul of
self and a. self should contain the weights.Source§fn quantized_act_type(&self) -> Option<DType>
fn quantized_act_type(&self) -> Option<DType>
If a quantized method, return the activation dtype.
Source§fn add_delta_w(&self, _delta: &Tensor) -> Result<Arc<dyn QuantMethod>>
fn add_delta_w(&self, _delta: &Tensor) -> Result<Arc<dyn QuantMethod>>
Add a delta weight from LoRA to the weights. This should be prescaled with alpha.
Source§fn dtype_and_device(&self) -> (DType, Device)
fn dtype_and_device(&self) -> (DType, Device)
Weight dtype and device
Source§fn apply_isq(
self: Arc<Self>,
dtype: Option<IsqType>,
device: Device,
n_quantized: &AtomicUsize,
imatrix_weight: Option<Vec<f32>>,
guard: QuantizeOntoGuard,
) -> Result<Arc<dyn QuantMethod>>
fn apply_isq( self: Arc<Self>, dtype: Option<IsqType>, device: Device, n_quantized: &AtomicUsize, imatrix_weight: Option<Vec<f32>>, guard: QuantizeOntoGuard, ) -> Result<Arc<dyn QuantMethod>>
If the quant is backed by a qmatmul.
Source§fn forward_autocast(&self, a: &Tensor) -> Result<Tensor>
fn forward_autocast(&self, a: &Tensor) -> Result<Tensor>
Compute matmul of
self and a. self should contain the weights.
Automatically cast to required quantization activation type and backSource§fn gather_forward_autocast(
&self,
a: &Tensor,
indices: &Tensor,
) -> Result<Tensor>
fn gather_forward_autocast( &self, a: &Tensor, indices: &Tensor, ) -> Result<Tensor>
Compute matmul of
self and a. self should contain the weights.
Automatically cast to required quantization activation type and back. Read morefn unquant_weight_bias(&self) -> Option<(Tensor, Option<Tensor>)>
Source§fn begin_track_stats(&mut self) -> Result<()>
fn begin_track_stats(&mut self) -> Result<()>
Begin tracking stats into an ImatrixLayerStats
Source§fn end_track_stats(&self) -> Result<Tensor>
fn end_track_stats(&self) -> Result<Tensor>
End tracking stats into an ImatrixLayerStats. Returns the computed imatrix.
fn is_distributed(&self) -> Option<DistributedKind>
Source§impl QuantizedSerde for PerTensorFP8Linear
impl QuantizedSerde for PerTensorFP8Linear
fn isq_serde_supported(&self) -> bool
fn name(&self) -> &'static str
fn serialize(&self) -> Result<Cow<'_, [u8]>>
Source§fn serialize_with_bias(&self, bias: Option<Tensor>) -> Result<Cow<'_, [u8]>>
fn serialize_with_bias(&self, bias: Option<Tensor>) -> Result<Cow<'_, [u8]>>
NOT meant for external calling
fn deserialize(
_data: Cow<'_, [u8]>,
_device: &Device,
_comm: &Arc<Comm>,
_guard: QuantizeOntoGuard,
) -> Result<Arc<dyn QuantMethod>>where
Self: Sized,
fn deserialize_ext_bias(
_data: Cow<'_, [u8]>,
_device: &Device,
_guard: QuantizeOntoGuard,
) -> Result<(Arc<dyn QuantMethod>, Option<Tensor>)>where
Self: Sized,
Auto Trait Implementations§
impl Freeze for PerTensorFP8Linear
impl !RefUnwindSafe for PerTensorFP8Linear
impl Send for PerTensorFP8Linear
impl Sync for PerTensorFP8Linear
impl Unpin for PerTensorFP8Linear
impl UnsafeUnpin for PerTensorFP8Linear
impl !UnwindSafe for PerTensorFP8Linear
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more