pub struct QuantizedModel {
pub quantized_params: Vec<QuantizedTensor>,
pub quant_type: QuantType,
pub total_params: usize,
pub total_bytes: usize,
pub original_bytes: usize,
}Expand description
A fully quantized model for inference.
Wraps a collection of quantized parameters and provides fast forward pass by dequantizing weights on-the-fly during computation.
§Usage
ⓘ
use axonml_quant::inference::QuantizedModel;
use axonml_quant::QuantType;
let qmodel = QuantizedModel::from_module(&model, QuantType::Q8_0);
println!("{}", qmodel.summary());
qmodel.load_into_module(&model); // dequant weights back for inferenceFields§
§quantized_params: Vec<QuantizedTensor>Quantized weight tensors (in parameter order).
quant_type: QuantTypeQuantization type used.
total_params: usizeTotal original parameter count.
total_bytes: usizeTotal quantized size in bytes.
original_bytes: usizeOriginal f32 size in bytes.
Implementations§
Source§impl QuantizedModel
impl QuantizedModel
Sourcepub fn from_module<M: Module>(module: &M, quant_type: QuantType) -> Self
pub fn from_module<M: Module>(module: &M, quant_type: QuantType) -> Self
Quantize a Module’s parameters.
Sourcepub fn load_into_module<M: Module>(&self, module: &M)
pub fn load_into_module<M: Module>(&self, module: &M)
Load quantized weights back into a Module for inference.
Dequantizes all parameters and updates the module’s parameters in-place. This is the simplest integration path — the model runs at full f32 speed but loads from a compressed checkpoint.
Sourcepub fn compression_ratio(&self) -> f32
pub fn compression_ratio(&self) -> f32
Compression ratio.
Auto Trait Implementations§
impl Freeze for QuantizedModel
impl RefUnwindSafe for QuantizedModel
impl Send for QuantizedModel
impl Sync for QuantizedModel
impl Unpin for QuantizedModel
impl UnsafeUnpin for QuantizedModel
impl UnwindSafe for QuantizedModel
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more