pub struct MinMaxQuantizer { /* private fields */ }Expand description
Calibrates quantization parameters using tensor min/max statistics.
Implementations§
Source§impl MinMaxQuantizer
impl MinMaxQuantizer
Sourcepub fn new(
bits: u32,
scheme: QuantScheme,
granularity: QuantGranularity,
) -> Self
pub fn new( bits: u32, scheme: QuantScheme, granularity: QuantGranularity, ) -> Self
Sourcepub fn int8_symmetric() -> Self
pub fn int8_symmetric() -> Self
Standard INT8 symmetric per-tensor quantizer.
Sourcepub fn int4_per_group(group_size: usize) -> Self
pub fn int4_per_group(group_size: usize) -> Self
Standard INT4 symmetric per-group quantizer (group = 128, as in GGML).
Sourcepub fn calibrate(&self, tensor: &[f32]) -> QuantResult<QuantParams>
pub fn calibrate(&self, tensor: &[f32]) -> QuantResult<QuantParams>
Calibrate parameters from a flat tensor.
For PerChannel, the tensor is assumed to be in row-major layout with
n_channels rows of length tensor.len() / n_channels.
§Errors
QuantError::EmptyInput— iftensoris empty.QuantError::GroupSizeMismatch— ifPerGroupsize does not divide.QuantError::DimensionMismatch— ifPerChannelaxis is inconsistent.
Sourcepub fn calibrate_2d(
&self,
tensor: &[f32],
rows: usize,
cols: usize,
) -> QuantResult<QuantParams>
pub fn calibrate_2d( &self, tensor: &[f32], rows: usize, cols: usize, ) -> QuantResult<QuantParams>
Calibrate from a 2-D tensor (rows = channels).
Returns one (scale, zp) per row.
§Errors
QuantError::EmptyInputifrows == 0.QuantError::DimensionMismatchifcols == 0.
Sourcepub fn quantize(
&self,
tensor: &[f32],
params: &QuantParams,
) -> QuantResult<Vec<i32>>
pub fn quantize( &self, tensor: &[f32], params: &QuantParams, ) -> QuantResult<Vec<i32>>
Quantize a flat tensor given pre-computed params (PerTensor mode).
Returns Vec<i32> of integer codes.
§Errors
QuantError::InvalidScaleifparams.scales[0] <= 0.
Sourcepub fn quantize_grouped(
&self,
tensor: &[f32],
params: &QuantParams,
group_size: usize,
) -> QuantResult<Vec<i32>>
pub fn quantize_grouped( &self, tensor: &[f32], params: &QuantParams, group_size: usize, ) -> QuantResult<Vec<i32>>
Quantize using per-group params.
§Errors
QuantError::GroupSizeMismatchif tensor size is not divisible by group_size.
Sourcepub fn dequantize(&self, codes: &[i32], params: &QuantParams) -> Vec<f32>
pub fn dequantize(&self, codes: &[i32], params: &QuantParams) -> Vec<f32>
Dequantize integer codes back to f32.
Sourcepub fn dequantize_grouped(
&self,
codes: &[i32],
params: &QuantParams,
group_size: usize,
) -> Vec<f32>
pub fn dequantize_grouped( &self, codes: &[i32], params: &QuantParams, group_size: usize, ) -> Vec<f32>
Dequantize per-group codes.
Trait Implementations§
Source§impl Clone for MinMaxQuantizer
impl Clone for MinMaxQuantizer
Source§fn clone(&self) -> MinMaxQuantizer
fn clone(&self) -> MinMaxQuantizer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for MinMaxQuantizer
impl RefUnwindSafe for MinMaxQuantizer
impl Send for MinMaxQuantizer
impl Sync for MinMaxQuantizer
impl Unpin for MinMaxQuantizer
impl UnsafeUnpin for MinMaxQuantizer
impl UnwindSafe for MinMaxQuantizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more