Skip to main content

Module quantization

Module quantization 

Source
Expand description

Core quantization logic for INT8 and INT4.

Provides tensor-level quantization (per-tensor and per-channel), INT4 bit-packing, and the high-level Quantizer that combines a QuantConfig with optional calibration statistics.

Structs§

Int4Range
Marker for INT4 quantization (-8 … 7).
Int8Range
Marker for INT8 quantization (-128 … 127).
QuantConfig
Configuration for a quantization pass.
QuantParamsGeneric
Affine quantization parameters (scale and zero-point), generic over bit-width.
QuantizedTensorGeneric
Generic quantized tensor, parameterized by bit-width marker.
Quantizer
High-level quantizer that combines configuration with optional calibration.

Enums§

QuantizedTensorType
Type-erased wrapper over QuantizedTensor (INT8) and QuantizedTensorInt4 (INT4).

Traits§

QuantRange
Marker trait that supplies the clamp constants for a quantization bit-width.

Functions§

pack_int4
Pack a slice of INT4 values (two per byte, high nibble first).
unpack_int4
Unpack INT4 values from packed bytes, returning exactly num_values i8s.

Type Aliases§

QuantParams
INT8 affine quantization parameters — clamp(-128, 127).
QuantParamsInt4
INT4 affine quantization parameters — clamp(-8, 7).
QuantizedTensor
An INT8 quantized tensor with optional per-channel parameters.
QuantizedTensorInt4
An INT4 quantized tensor with optional per-channel parameters and bit packing.