Skip to main content

Module quantization

Module quantization 

Source
Expand description

Quantization module for efficient inference

This module provides:

  • INT8 quantization
  • INT4 quantization (experimental)
  • Quantized matrix multiplication
  • Post-training quantization (PTQ)

Modules§

weight_quantization
Quantization utilities for model weights

Structs§

QuantizationConfig
Quantization configuration
QuantizedMatMul
Quantized matrix multiplication
QuantizedTensor
Quantized tensor (INT8)

Enums§

QuantDtype
Quantization data type