pub struct QuantizedLinear {
pub in_features: usize,
pub out_features: usize,
pub quant_type: QuantType,
/* private fields */
}Expand description
A linear layer with quantized weights for fast inference.
Stores weights as QuantizedTensor (Q8/Q4/F16) and dequantizes on-the-fly
during the matrix multiply. Bias remains in f32.
§Usage
use axonml_quant::inference::QuantizedLinear;
use axonml_quant::QuantType;
let qlinear = QuantizedLinear::from_linear_params(&weights, Some(&bias), 512, 128, QuantType::Q8_0);
let output = qlinear.forward_f32(&input_data, 1);Fields§
§in_features: usizeInput feature dimension.
out_features: usizeOutput feature dimension.
quant_type: QuantTypeQuantization type used.
Implementations§
Source§impl QuantizedLinear
impl QuantizedLinear
Sourcepub fn from_linear_params(
weight_data: &[f32],
bias_data: Option<&[f32]>,
in_features: usize,
out_features: usize,
quant_type: QuantType,
) -> Self
pub fn from_linear_params( weight_data: &[f32], bias_data: Option<&[f32]>, in_features: usize, out_features: usize, quant_type: QuantType, ) -> Self
Create a QuantizedLinear from an axonml_nn::Linear layer.
Sourcepub fn forward_f32(&self, input: &[f32], batch_size: usize) -> Vec<f32>
pub fn forward_f32(&self, input: &[f32], batch_size: usize) -> Vec<f32>
Forward pass: f32 input → f32 output.
Input shape: [batch, in_features]
Output shape: [batch, out_features]
Performs quantized matrix multiplication: each output element is computed by iterating over the weight row’s quantized blocks and accumulating dot products with the corresponding activation slices.
Sourcepub fn forward_var(&self, input: &Variable) -> Variable
pub fn forward_var(&self, input: &Variable) -> Variable
Forward pass with Variable input/output (for integration with autograd).
Note: Quantized inference is forward-only (no gradient tracking).
The output Variable has requires_grad = false.
Sourcepub fn weight_bytes(&self) -> usize
pub fn weight_bytes(&self) -> usize
Memory usage in bytes (weights only, excludes bias).
Sourcepub fn compression_ratio(&self) -> f32
pub fn compression_ratio(&self) -> f32
Compression ratio vs f32 weights.
Sourcepub fn dequantize_weights(&self) -> Tensor<f32>
pub fn dequantize_weights(&self) -> Tensor<f32>
Dequantize weights back to f32 (for debugging/validation).
Trait Implementations§
Source§impl Clone for QuantizedLinear
impl Clone for QuantizedLinear
Source§fn clone(&self) -> QuantizedLinear
fn clone(&self) -> QuantizedLinear
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for QuantizedLinear
impl RefUnwindSafe for QuantizedLinear
impl Send for QuantizedLinear
impl Sync for QuantizedLinear
impl Unpin for QuantizedLinear
impl UnsafeUnpin for QuantizedLinear
impl UnwindSafe for QuantizedLinear
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more