Expand description
Q16.16 signed fixed-point arithmetic.
Why fixed-point: the CPU reference path and the CUDA kernels must produce byte-identical outputs cell-for-cell so the hash chain in the case file is the same regardless of whether evidence was produced on the host or the device. Floating point makes byte-equivalence fragile across compilers, drivers, fused-multiply-add behavior, and reduction order. Q16.16 with explicit rounding sidesteps all of that.
Why Q16.16 specifically: 16.16 strikes a balance between dynamic range (i32 covers roughly ±32_767 in the integer half) and precision (1/65_536 in the fractional half) that fits this workload — latencies in milliseconds, error-rate fractions, and smoothed-residual magnitudes all sit comfortably inside that band once inputs are clamped at the boundary.
Why these specific operations: sat_add, sat_sub, sat_mul, sat_div,
and abs are the only operations the downstream pipeline needs. No FMA,
no SIMD intrinsics. The same scalar code must run on the CPU and inside a
CUDA kernel, so the implementation is restricted to plain integer math
that both backends can express identically.
Rounding rule: multiplication widens to i64, then applies round-half-to- even (banker’s rounding) at bit 15 before shifting right by 16. This rule is symmetric, deterministic across architectures, and reduces the bias that round-half-up would inject into long EWMA recurrences.
Structs§
- Q16
- Signed Q16.16 fixed-point value.