pub struct QuantizedLinear { /* private fields */ }Expand description
A weight matrix quantized to i8, with per-channel or per-tensor
scale/zero_point.
Forward pass: dequantize weights on the fly, then f64 matmul.
§Layout
weight_q has shape (out_features, in_features) — the same convention
as LinearExpert::weights (see moe/expert.rs).
Implementations§
Source§impl QuantizedLinear
impl QuantizedLinear
Sourcepub fn from_fp(
weight: &Array2<f64>,
params: &QuantizationParams,
) -> Result<Self, QuantizationError>
pub fn from_fp( weight: &Array2<f64>, params: &QuantizationParams, ) -> Result<Self, QuantizationError>
Quantize an existing f64 weight matrix using the provided params.
Only Int8 quantization type is supported. Use
crate::quantization::calibrate_linear to produce params.
§Errors
QuantizationError::InvalidParamsifqtype != Int8.QuantizationError::ShapeMismatchif the weight is not 2-D or the scale/zero_point vectors are the wrong length forPerChannel.
Sourcepub fn with_bias(self, bias: Array1<f64>) -> Result<Self, QuantizationError>
pub fn with_bias(self, bias: Array1<f64>) -> Result<Self, QuantizationError>
Attach a bias vector of length out_features.
§Errors
QuantizationError::ShapeMismatch if bias.len() != out_features.
Sourcepub fn forward(&self, x: &Array2<f64>) -> Array2<f64>
pub fn forward(&self, x: &Array2<f64>) -> Array2<f64>
Dequantize and run matmul.
Input x must have shape [batch, in_features].
Output has shape [batch, out_features].
fp = (q - zero_point[c]) * scale[c] per element, where c is the
output channel (row) index when granularity == PerChannel, or 0
for PerTensor.
Sourcepub fn dequantize(&self) -> Array2<f64>
pub fn dequantize(&self) -> Array2<f64>
Dequantize the stored i8 weights back to f64.
For PerTensor: all elements use scale[0] / zero_point[0].
For PerChannel: each row c uses scale[c] / zero_point[c].
Sourcepub fn out_features(&self) -> usize
pub fn out_features(&self) -> usize
Return the output feature dimension.
Sourcepub fn in_features(&self) -> usize
pub fn in_features(&self) -> usize
Return the input feature dimension.
Sourcepub fn granularity(&self) -> QuantizationGranularity
pub fn granularity(&self) -> QuantizationGranularity
Return the quantization granularity.
Auto Trait Implementations§
impl Freeze for QuantizedLinear
impl RefUnwindSafe for QuantizedLinear
impl Send for QuantizedLinear
impl Sync for QuantizedLinear
impl Unpin for QuantizedLinear
impl UnsafeUnpin for QuantizedLinear
impl UnwindSafe for QuantizedLinear
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more