Skip to main content

Crate quantize_rs

Crate quantize_rs 

Source
Expand description

Neural network quantization toolkit for ONNX models.

quantize-rs converts FP32 ONNX model weights to INT8 or INT4, reducing model size by 4–8x with minimal accuracy loss. It supports per-tensor and per-channel quantization, calibration-based range optimization, and writes ONNX-Runtime-compatible QDQ models.

§Modules

  • quantization – core quantization logic (INT8/INT4, per-channel, packing)
  • onnx_utils – ONNX model loading, weight extraction, QDQ save, validation
  • calibration – (feature calibration) calibration datasets, activation-based inference, range methods
  • config – YAML/TOML configuration file support
  • errors – typed error enum (QuantizeError) for all public API functions

§Feature flags

  • calibration (default) – enables activation-based calibration (adds tract-onnx, ndarray)
  • python – enables PyO3 bindings (quantize_rs Python module)

Re-exports§

pub use errors::QuantizeError;
pub use onnx_utils::ModelInfo;
pub use onnx_utils::OnnxModel;
pub use onnx_utils::WeightTensor;
pub use onnx_utils::QuantizedWeightInfo;
pub use onnx_utils::ConnectivityReport;
pub use onnx_utils::graph_builder::QdqWeightInput;
pub use quantization::Quantizer;
pub use quantization::QuantConfig;
pub use quantization::QuantParams;
pub use quantization::pack_int4;
pub use quantization::unpack_int4;
pub use config::Config;
pub use calibration::CalibrationDataset;
pub use calibration::stats::ActivationStats;
pub use calibration::inference::ActivationEstimator;

Modules§

calibration
Calibration datasets and activation-based range estimation.
config
YAML and TOML configuration file support.
errors
Typed error handling for the quantize-rs library.
onnx_proto
Prost-generated ONNX protobuf types.
onnx_utils
ONNX model utilities — loading, weight extraction, quantized save (QDQ), graph connectivity validation, and quantized-model introspection.
quantization
Core quantization logic for INT8 and INT4.

Constants§

VERSION
Library version string, read from Cargo.toml at compile time.