Skip to main content

Module scheme

Module scheme 

Source
Expand description

§Quantization Schemes

This module exposes a suite of post-training quantization (PTQ) strategies:

ModuleSchemePrimary use
minmaxMin-Max calibration (INT4/INT8)General PTQ
nf4NormalFloat4 (QLoRA)4-bit weights
fp8FP8 E4M3 / E5M2 (Hopper / Blackwell)Training & inference
gptqGPTQ Hessian-guided quantizationLLM weights
smooth_quantSmoothQuant activation–weight migrationLLM activations

Re-exports§

pub use fp8::Fp8Codec;
pub use fp8::Fp8Format;
pub use gptq::GptqConfig;
pub use gptq::GptqOutput;
pub use gptq::GptqQuantizer;
pub use minmax::MinMaxQuantizer;
pub use minmax::QuantGranularity;
pub use minmax::QuantParams;
pub use minmax::QuantScheme;
pub use nf4::NF4_LUT;
pub use nf4::Nf4Quantizer;
pub use smooth_quant::SmoothQuantConfig;
pub use smooth_quant::SmoothQuantMigrator;

Modules§

fp8
FP8 Floating-Point Quantization
gptq
GPTQ — Gradient-Free Post-Training Quantization
minmax
MinMax Quantizer
nf4
NF4 — NormalFloat4 Quantization
smooth_quant
SmoothQuant — Activation–Weight Quantization Migration