Expand description
Neural network quantization toolkit for ONNX models.
quantize-rs converts FP32 ONNX model weights to INT8 or INT4,
reducing model size by 4–8x with minimal accuracy loss. It supports
per-tensor and per-channel quantization, calibration-based range
optimization, and writes ONNX-Runtime-compatible QDQ models.
§Modules
quantization– core quantization logic (INT8/INT4, per-channel, packing)onnx_utils– ONNX model loading, weight extraction, QDQ save, validationcalibration– (featurecalibration) calibration datasets, activation-based inference, range methodsconfig– YAML/TOML configuration file supporterrors– typed error enum (QuantizeError) for all public API functions
§Feature flags
calibration(default) – enables activation-based calibration (addstract-onnx,ndarray)python– enables PyO3 bindings (quantize_rsPython module)
Re-exports§
pub use errors::QuantizeError;pub use onnx_utils::ModelInfo;pub use onnx_utils::OnnxModel;pub use onnx_utils::WeightTensor;pub use onnx_utils::QuantizedWeightInfo;pub use onnx_utils::ConnectivityReport;pub use onnx_utils::graph_builder::QdqWeightInput;pub use quantization::Quantizer;pub use quantization::QuantConfig;pub use quantization::QuantParams;pub use quantization::pack_int4;pub use quantization::unpack_int4;pub use config::Config;pub use calibration::CalibrationDataset;pub use calibration::stats::ActivationStats;pub use calibration::inference::ActivationEstimator;
Modules§
- calibration
- Calibration datasets and activation-based range estimation.
- config
- YAML and TOML configuration file support.
- errors
- Typed error handling for the quantize-rs library.
- onnx_
proto - Prost-generated ONNX protobuf types.
- onnx_
utils - ONNX model utilities — loading, weight extraction, quantized save (QDQ), graph connectivity validation, and quantized-model introspection.
- quantization
- Core quantization logic for INT8 and INT4.
Constants§
- VERSION
- Library version string, read from
Cargo.tomlat compile time.