Skip to main content

Module quantization

Module quantization 

Source
Expand description

Model quantisation.

ModuleContents
int8INT8 post-training quantisation, FP16 export, calibration

§Supported schemes

SchemeMemory savingQuality loss
FP16 (half precision)Negligible
INT8 symmetric per-tensorSmall (~0.5–1 % on retrieval tasks)
INT8 symmetric per-channelVery small (~0.1–0.3 %)

§Workflow

use sensorlm::quantization::int8::{quantize_model_weights, QuantizedModel};
use std::path::Path;

// Extract weights from a trained model (pseudo-code):
// let weights = extract_linear_weights(&trained_model);
// let qm = quantize_model_weights(config_json, weights.into_iter());
// qm.save(Path::new("model_int8.json")).unwrap();

Modules§

int8
INT8 post-training quantisation (PTQ).