Expand description
Quantization Support for Model Compression
This module provides quantization techniques to compress knowledge graph embeddings by reducing precision from float32 to int8/int4, significantly reducing model size and improving inference speed.
Structs§
- Model
Quantizer - Model quantizer
- Quantization
Config - Quantization configuration
- Quantization
Params - Quantization parameters
- Quantization
Stats - Quantization statistics
- Quantized
Tensor - Quantized tensor representation
Enums§
- BitWidth
- Quantization bit width
- Quantization
Scheme - Quantization scheme