Crate numquant

Expand description

Lossy conversion from floating point to a smaller integer type with a fixed range.

Quantize an `f64` to a byte and back again

use numquant::{Quantize, Quantized, U8};
let original = 500.0;
// Quantize the value into a byte.
// Quantization supports inputs between 0 and 1000.
let quantized = Quantized::<U8<0, 1000>>::from_f64(original);
// Convert it back to an f64
let dequantized = quantized.to_f64();
// The conversion isn't lossless, but the dequantized value is close to the original:
approx::assert_abs_diff_eq!(original, dequantized, epsilon = U8::<0, 1000>::max_error());

Modules

linear

For quantizing values linearly to a range.

Structs

IntRange

Quantizes/dequantizes to a value between 0 and Q_MAX stored in type T. The range for the unquantized value is between MIN and MAX. Values outside of this are clamped.

Quantized

Contains a quantized value.

Traits

Quantize

Trait for quantizing and dequantizing values.

Type Definitions

U8

Quantizes/dequantizes to a value stored in an u8, using the full range of the u8. The range for the unquantized value is between MIN and MAX. Values outside of this are clamped.

U16

Quantizes/dequantizes to a value stored in an u16, using the full range of the u16. The range for the unquantized value is between MIN and MAX. Values outside of this are clamped.

U32