Expand description
Model quantisation.
| Module | Contents |
|---|---|
int8 | INT8 post-training quantisation, FP16 export, calibration |
§Supported schemes
| Scheme | Memory saving | Quality loss |
|---|---|---|
| FP16 (half precision) | 2× | Negligible |
| INT8 symmetric per-tensor | 4× | Small (~0.5–1 % on retrieval tasks) |
| INT8 symmetric per-channel | 4× | Very small (~0.1–0.3 %) |
§Workflow
use sensorlm::quantization::int8::{quantize_model_weights, QuantizedModel};
use std::path::Path;
// Extract weights from a trained model (pseudo-code):
// let weights = extract_linear_weights(&trained_model);
// let qm = quantize_model_weights(config_json, weights.into_iter());
// qm.save(Path::new("model_int8.json")).unwrap();Modules§
- int8
- INT8 post-training quantisation (PTQ).