Expand description
Post-Training Quantization (PTQ) for transformer weight matrices.
This module provides a CPU-first implementation of INT8 quantization for
linear layers, following the same “Paradigm B — numerical layers” design
used by the moe module.
§Architecture
QuantizedLinear: A weight matrix stored asArray2<i8>with per-channel or per-tensor scale/zero_point. Forward pass dequantizes on the fly then performs f64 matmul (integer-matmul kernel is a future follow-up).calibrate_linear: Wrapstensorlogic-scirs-backend’scalibrate_quantizationto produceQuantizationParamsfrom a weight matrix, including per-channel calibration.
§Example
ⓘ
use ndarray::Array2;
use tensorlogic_trustformers::quantization::{calibrate_linear, QuantizedLinear};
use tensorlogic_scirs_backend::quantization::{QuantizationGranularity, QuantizationType};
let weight = Array2::from_shape_fn((4, 8), |(i, j)| (i * 8 + j) as f64);
let params = calibrate_linear(&weight, QuantizationType::Int8,
QuantizationGranularity::PerChannel);
let qlinear = QuantizedLinear::from_fp(&weight, ¶ms).expect("quantize");
let x = Array2::ones((2, 8));
let out = qlinear.forward(&x);
assert_eq!(out.shape(), &[2, 4]);Re-exports§
pub use calibration::calibrate_linear;pub use linear::QuantizationError;pub use linear::QuantizedLinear;
Modules§
- calibration
- Calibration helpers for
crate::quantization::QuantizedLinearweight matrices. - linear
QuantizedLinear: an INT8 weight matrix with per-channel or per-tensor scale/zero_point.