Expand description
Quantization: QAT and PTQ
Provides quantization for QLoRA and Quantization-Aware Training:
- 4-bit block-wise quantization for QLoRA
- Fake quantization with STE for QAT
- PTQ calibration (min-max, percentile, moving average)
- GGUF-compatible Q4_0/Q8_0 formats
- Per-channel vs per-tensor quantization granularity
- Quantization error analysis and metrics
- Accuracy degradation benchmarks
Structs§
- Benchmark
Suite - Suite of benchmark results
- Calibration
Result - Calibration result containing scale and zero_point
- Calibrator
- PTQ Calibrator for collecting statistics and computing quantization parameters
- Double
Quantized4 Bit - Double-quantized 4-bit representation
- Fake
Quant Config - Fake quantization configuration
- Fake
Quantize - Fake quantization operation with Straight-Through Estimator (STE)
- Q4_0
- Q4_0 quantized tensor (GGUF format)
- Q8_0
- Q8_0 quantized tensor (GGUF format)
- Quant
Benchmark Result - Benchmark results for quantization accuracy
- Quant
Error Stats - Error statistics for quantization analysis
- Quant
Params - Quantization parameters for a tensor
- Quantized4
Bit - 4-bit quantized representation with block-wise scale factors
- Quantized
Tensor - Quantized tensor with per-channel or per-tensor quantization
Enums§
- Calibration
Method - Calibration method for PTQ
- GGUF
Quant Type - Quantization type enum for GGUF export
- Quant
Granularity - Quantization granularity options
- Quant
Mode - Quantization mode: symmetric or asymmetric
Constants§
- BLOCK_
SIZE - Block size for quantization (64 elements per block)
- DOUBLE_
QUANT_ BLOCK_ SIZE - Block size for second-level scale quantization (256 scales per super-block)
- GGUF_
BLOCK_ SIZE - GGUF block size (standard for llama.cpp)
Functions§
- accuracy_
retention - Calculate accuracy retention percentage
- analyze_
error - Analyze quantization error for given values and parameters
- analyze_
outlier_ impact - Analyze impact of outliers on quantization error
- calibrate_
min_ max - Convenience function for min-max calibration
- calibrate_
per_ channel - Calibrate quantization parameters for per-channel quantization
- calibrate_
per_ group - Calibrate quantization parameters for per-group quantization
- calibrate_
per_ tensor - Calibrate quantization parameters for per-tensor quantization
- calibrate_
percentile - Convenience function for percentile calibration
- compare_
bit_ width_ degradation - Compare accuracy degradation across bit widths
- compare_
bit_ widths - Compare error between different bit widths
- compare_
granularities - Compare per-channel vs per-tensor quantization error
- dequantize_
4bit - Dequantize 4-bit values back to f32
- dequantize_
4bit_ double - Dequantize double-quantized 4-bit values back to f32
- dequantize_
tensor - Dequantize tensor
- dequantize_
with_ params - Dequantize values using given parameters
- error_
within_ bounds - Check if error is within expected bounds
- fake_
quantize - Convenience function for fake quantization forward pass
- generate_
gaussian_ weights - Generate Gaussian-like weight distribution (common in neural networks)
- generate_
multi_ channel_ weights - Generate multi-channel weights (like conv/linear layer)
- generate_
uniform_ weights - Generate uniform weights in range
- generate_
weights_ with_ outliers - Generate weights with outliers (to test robustness)
- quantization_
mse - Compute quantization error (MSE)
- quantize_
4bit - Quantize f32 values to 4-bit with block-wise scaling
- quantize_
4bit_ double - Quantize values to 4-bit with double quantization of scale factors
- quantize_
tensor - Quantize tensor with specified granularity
- quantize_
with_ params - Quantize values using given parameters
- run_
benchmark - Run benchmark on given values with specified configuration
- run_
full_ benchmark_ suite - Run full benchmark suite on various weight patterns
- scale_
sensitivity - Analyze sensitivity of error to scale perturbation
- ste_
backward - Convenience function for STE backward pass
- theoretical_
max_ error - Calculate theoretical maximum error for given quantization parameters
- theoretical_
sqnr - Calculate expected SQNR for uniform quantization