Skip to main content

Module f16

Module f16 

Source
Expand description

IEEE 754 half-precision (f16) floating-point type. Half Precision (f16) — IEEE 754 binary16 with promotion to f64.

§Design

f16 values are promoted to f64 before entering the binned accumulation path. Subnormal handling is preserved by the bin 0 logic of the BinnedAccumulator. Arithmetic is performed in f64, then narrowed back to f16 on storage.

§IEEE 754 binary16 Layout

Bit 15:     sign
Bits 14-10: exponent (5 bits, bias = 15)
Bits 9-0:   mantissa (10 bits)

Range: ±65504 (max normal), ±6.1e-5 (min positive subnormal)

Structs§

F16
IEEE 754 binary16 half-precision float.

Functions§

f16_binned_dot
Dot product of two f16 slices, accumulated in f64 via BinnedAccumulator.
f16_binned_sum
Sum f16 values by promoting to f64 and using BinnedAccumulator.
f16_matmul
Matrix multiply for f16 arrays, computing in f64 via BinnedAccumulator.