Skip to main content

Module quantize

Module quantize 

Source
Expand description

f64 → f32 quantization utilities for packed export.

When converting a trained SGBT (f64 precision) to the packed format (f32), thresholds and leaf values are quantized. This module provides validation to ensure the precision loss is acceptable.

Constants§

DEFAULT_TOLERANCE
Maximum acceptable absolute difference between f64 and f32 representations.

Functions§

max_quantization_error
Compute the maximum absolute quantization error across a slice of f64 values.
quantize_leaf
Quantize a leaf value with learning rate baked in.
quantize_threshold
Quantize an f64 threshold to f32, returning the f32 value.
within_tolerance
Check whether quantizing value to f32 stays within tolerance.