1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
//! Neural network quantization toolkit for ONNX models.
//!
//! `quantize-rs` converts FP32 ONNX model weights to INT8 or INT4,
//! reducing model size by 4--8x with minimal accuracy loss. It supports
//! per-tensor and per-channel quantization, calibration-based range
//! optimization, and writes ONNX-Runtime-compatible QDQ models.
//!
//! # Modules
//!
//! - [`quantization`] -- core quantization logic (INT8/INT4, per-channel, packing)
//! - [`onnx_utils`] -- ONNX model loading, weight extraction, QDQ save, validation
//! - [`calibration`] -- (feature `calibration`) calibration datasets, activation-based inference, range methods
//! - [`config`] -- YAML/TOML configuration file support
//! - [`errors`] -- typed error enum ([`QuantizeError`]) for all public API functions
//!
//! # Feature flags
//!
//! - **`calibration`** *(default)* -- enables activation-based calibration (adds `tract-onnx`, `ndarray`)
//! - **`python`** -- enables PyO3 bindings (`quantize_rs` Python module)
/// Raw prost-generated ONNX protobuf types. Use the higher-level wrappers
/// in [`onnx_utils`] instead — this module is an implementation detail and
/// may change without notice.
// ---- Stable public re-exports (prefer these over reaching into submodules) ----
pub use ActivationEstimator;
pub use ;
pub use Config;
pub use QuantizeError;
pub use QdqWeightInput;
pub use ;
pub use ;
/// Library version string, read from `Cargo.toml` at compile time.
pub const VERSION: &str = env!;