Skip to main content

Module scheme

Module scheme

Expand description

§Quantization Schemes

This module exposes a suite of post-training quantization (PTQ) strategies:

Module	Scheme	Primary use
`minmax`	Min-Max calibration (INT4/INT8)	General PTQ
`nf4`	NormalFloat4 (QLoRA)	4-bit weights
`fp8`	FP8 E4M3 / E5M2 (Hopper / Blackwell)	Training & inference
`gptq`	GPTQ Hessian-guided quantization	LLM weights
`smooth_quant`	SmoothQuant activation–weight migration	LLM activations

Re-exports§

pub use fp8::Fp8Codec;
pub use fp8::Fp8Format;
pub use gptq::GptqConfig;
pub use gptq::GptqOutput;
pub use gptq::GptqQuantizer;
pub use minmax::MinMaxQuantizer;
pub use minmax::QuantGranularity;
pub use minmax::QuantParams;
pub use minmax::QuantScheme;
pub use nf4::NF4_LUT;
pub use nf4::Nf4Quantizer;
pub use smooth_quant::SmoothQuantConfig;
pub use smooth_quant::SmoothQuantMigrator;

Modules§

fp8: FP8 Floating-Point Quantization
gptq: GPTQ — Gradient-Free Post-Training Quantization
minmax: MinMax Quantizer
nf4: NF4 — NormalFloat4 Quantization
smooth_quant: SmoothQuant — Activation–Weight Quantization Migration