Expand description
f64 → f32 quantization utilities for packed export.
When converting a trained SGBT (f64 precision) to the packed format (f32), thresholds and leaf values are quantized. This module provides validation to ensure the precision loss is acceptable.
Constants§
- DEFAULT_
TOLERANCE - Maximum acceptable absolute difference between f64 and f32 representations.
Functions§
- max_
quantization_ error - Compute the maximum absolute quantization error across a slice of f64 values.
- quantize_
leaf - Quantize a leaf value with learning rate baked in.
- quantize_
threshold - Quantize an f64 threshold to f32, returning the f32 value.
- within_
tolerance - Check whether quantizing
valueto f32 stays within tolerance.