Expand description
Lossy conversion from floating point to a smaller integer type with a fixed range.
§Quantize an f64
to a byte and back again
use numquant::{Quantize, Quantized, U8};
let original = 500.0;
// Quantize the value into a byte.
// Quantization supports inputs between 0 and 1000.
let quantized = Quantized::<U8<0, 1000>>::from_f64(original);
// Convert it back to an f64
let dequantized = quantized.to_f64();
// The conversion isn't lossless, but the dequantized value is close to the original:
approx::assert_abs_diff_eq!(original, dequantized, epsilon = U8::<0, 1000>::max_error());
Modules§
- linear
- For quantizing values linearly to a range.
Structs§
- IntRange
- Quantizes/dequantizes to a value between 0 and
Q_MAX
stored in typeT
. The range for the unquantized value is betweenMIN
andMAX
. Values outside of this are clamped. - Quantized
- Contains a quantized value.
Traits§
- Quantize
- Trait for quantizing and dequantizing values.
Type Aliases§
- U8
- Quantizes/dequantizes to a value stored in an
u8
, using the full range of theu8
. The range for the unquantized value is betweenMIN
andMAX
. Values outside of this are clamped. - U16
- Quantizes/dequantizes to a value stored in an
u16
, using the full range of theu16
. The range for the unquantized value is betweenMIN
andMAX
. Values outside of this are clamped. - U32
- Quantizes/dequantizes to a value stored in an
u32
, using the full range of theu32
. The range for the unquantized value is betweenMIN
andMAX
. Values outside of this are clamped.