pub struct QuantizationParams {
pub dtype: QuantizedDType,
pub scheme: QuantizationScheme,
pub scale: Vec<f32>,
pub zero_point: Vec<i32>,
pub block_size: Option<usize>,
pub min_val: Option<f32>,
pub max_val: Option<f32>,
}Expand description
Quantization parameters
Contains all the parameters needed to quantize and dequantize tensors, including scale factors, zero points, and metadata about the quantization scheme being used.
Fields§
§dtype: QuantizedDTypeQuantization data type
Specifies the target quantized data type (e.g., Int8, UInt8, Int4)
scheme: QuantizationSchemeQuantization scheme
Defines how the quantization mapping is performed (linear, symmetric, etc.)
scale: Vec<f32>Scale factor(s)
Maps quantized values back to floating-point range. For per-channel quantization, contains one scale per channel. Formula: float_val = scale * (quantized_val - zero_point)
zero_point: Vec<i32>Zero point(s)
The quantized value that corresponds to floating-point zero. For per-channel quantization, contains one zero point per channel. For symmetric quantization, this is always 0.
block_size: Option<usize>Block size for block-wise quantization
When using block-wise quantization, specifies the size of each block that gets its own quantization parameters. None for other schemes.
min_val: Option<f32>Minimum value observed during calibration
Used for parameter calculation and validation. Set during calibration or when computing parameters from statistics.
max_val: Option<f32>Maximum value observed during calibration
Used for parameter calculation and validation. Set during calibration or when computing parameters from statistics.
Implementations§
Source§impl QuantizationParams
impl QuantizationParams
Sourcepub fn int8_symmetric() -> Self
pub fn int8_symmetric() -> Self
Create parameters for INT8 symmetric quantization
INT8 symmetric quantization is commonly used for weights in neural networks due to its simplicity and good hardware support. The zero point is always 0, and the range is symmetric around zero.
§Examples
use torsh_backend::quantization::QuantizationParams;
let params = QuantizationParams::int8_symmetric();
assert_eq!(params.zero_point[0], 0);Sourcepub fn new(scale: f32, zero_point: i32) -> Self
pub fn new(scale: f32, zero_point: i32) -> Self
Create basic quantization parameters with custom scale and zero point
This is a general-purpose constructor for creating quantization parameters with custom scale and zero point values. Useful for benchmarking and testing with specific parameter configurations.
§Arguments
scale- Scale factor for the quantizationzero_point- Zero point for the quantization
§Examples
use torsh_backend::quantization::QuantizationParams;
let params = QuantizationParams::new(255.0, 128);
assert_eq!(params.scale[0], 255.0);
assert_eq!(params.zero_point[0], 128);Sourcepub fn uint8_asymmetric() -> Self
pub fn uint8_asymmetric() -> Self
Create parameters for UINT8 asymmetric quantization
UInt8 asymmetric quantization is commonly used for activations, especially after ReLU layers where values are non-negative. The zero point is typically set to 128 for balanced range utilization.
§Examples
use torsh_backend::quantization::QuantizationParams;
let params = QuantizationParams::uint8_asymmetric();
assert_eq!(params.zero_point[0], 128);Sourcepub fn int4_symmetric() -> Self
pub fn int4_symmetric() -> Self
Create parameters for INT4 symmetric quantization
INT4 quantization provides extreme compression at the cost of accuracy. Symmetric INT4 is often used for weights in models where 4-bit precision is sufficient.
§Examples
use torsh_backend::quantization::QuantizationParams;
let params = QuantizationParams::int4_symmetric();
assert_eq!(params.dtype.bits(), 4);Sourcepub fn channel_wise(num_channels: usize, dtype: QuantizedDType) -> Self
pub fn channel_wise(num_channels: usize, dtype: QuantizedDType) -> Self
Create parameters for channel-wise quantization
Channel-wise quantization applies different quantization parameters to each channel, providing better accuracy for models with varying channel sensitivities at the cost of increased parameter storage.
§Arguments
num_channels- Number of channels in the tensordtype- Quantization data type to use
§Examples
use torsh_backend::quantization::{QuantizationParams, QuantizedDType};
let params = QuantizationParams::channel_wise(64, QuantizedDType::Int8);
assert_eq!(params.scale.len(), 64);
assert_eq!(params.zero_point.len(), 64);Sourcepub fn block_wise(block_size: usize, dtype: QuantizedDType) -> Self
pub fn block_wise(block_size: usize, dtype: QuantizedDType) -> Self
Create parameters for block-wise quantization
Block-wise quantization divides the tensor into blocks and applies different quantization parameters to each block. This can provide better accuracy than tensor-wise quantization while being more memory-efficient than channel-wise quantization.
§Arguments
block_size- Size of each quantization blockdtype- Quantization data type to use
§Examples
use torsh_backend::quantization::{QuantizationParams, QuantizedDType};
let params = QuantizationParams::block_wise(128, QuantizedDType::Int8);
assert_eq!(params.block_size, Some(128));Sourcepub fn from_statistics(
&mut self,
min_val: f32,
max_val: f32,
) -> BackendResult<()>
pub fn from_statistics( &mut self, min_val: f32, max_val: f32, ) -> BackendResult<()>
Calculate quantization parameters from input statistics
Computes the optimal scale and zero point parameters based on the observed minimum and maximum values in the data. The calculation depends on the quantization scheme being used.
§Arguments
min_val- Minimum value observed in the datamax_val- Maximum value observed in the data
§Returns
Returns Ok(()) if parameters were calculated successfully,
or an error if the statistics are invalid.
§Examples
use torsh_backend::quantization::QuantizationParams;
let mut params = QuantizationParams::int8_symmetric();
params.from_statistics(-2.0, 2.0).unwrap();
// Scale will be calculated to map [-2.0, 2.0] to [-128, 127]Sourcepub fn validate(&self) -> BackendResult<()>
pub fn validate(&self) -> BackendResult<()>
Validate that the parameters are consistent and usable
Checks that all parameter vectors have consistent lengths, scale factors are positive, and zero points are within valid ranges.
Sourcepub fn num_parameter_sets(&self) -> usize
pub fn num_parameter_sets(&self) -> usize
Get the effective number of quantization parameter sets
Returns the number of independent parameter sets (scale/zero_point pairs) that this configuration represents. For tensor-wise quantization this is 1, for channel-wise it’s the number of channels.
Sourcepub fn is_per_channel(&self) -> bool
pub fn is_per_channel(&self) -> bool
Check if this configuration uses per-channel parameters
Sourcepub fn quantization_error_bound(&self) -> f32
pub fn quantization_error_bound(&self) -> f32
Get the quantization error bound for this configuration
Returns the maximum possible quantization error (in the original floating-point scale) for this quantization configuration.
Sourcepub fn compression_ratio(&self) -> f32
pub fn compression_ratio(&self) -> f32
Calculate the compression ratio achieved by this quantization
Returns the ratio of original size to quantized size. Assumes the original data was 32-bit floating point.
Trait Implementations§
Source§impl Clone for QuantizationParams
impl Clone for QuantizationParams
Source§fn clone(&self) -> QuantizationParams
fn clone(&self) -> QuantizationParams
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for QuantizationParams
impl Debug for QuantizationParams
Auto Trait Implementations§
impl Freeze for QuantizationParams
impl RefUnwindSafe for QuantizationParams
impl Send for QuantizationParams
impl Sync for QuantizationParams
impl Unpin for QuantizationParams
impl UnsafeUnpin for QuantizationParams
impl UnwindSafe for QuantizationParams
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more