Skip to main content

quantize

Function quantize 

Source
pub fn quantize(v: &[f32]) -> (Vec<i8>, f32)
Expand description

Quantize a FP32 vector to ternary trits using BitNet absmean scaling.

Returns (trits, scale) where scale = mean(|v_i|).