Expand description
Quantization module for efficient inference
This module provides:
- INT8 quantization
- INT4 quantization (experimental)
- Quantized matrix multiplication
- Post-training quantization (PTQ)
Modules§
- weight_
quantization - Quantization utilities for model weights
Structs§
- Quantization
Config - Quantization configuration
- Quantized
MatMul - Quantized matrix multiplication
- Quantized
Tensor - Quantized tensor (INT8)
Enums§
- Quant
Dtype - Quantization data type