Expand description
BFloat16 (bf16) floating-point support
Implements the Google Brain bfloat16 format used extensively in ML training. BF16 has the same exponent range as f32 (8 bits) but reduced mantissa (7 bits), making it ideal for training where range matters more than precision.
Layout: 1 sign bit, 8 exponent bits, 7 mantissa bits. Range: same as f32 (±3.4×10³⁸), precision: ~2 decimal digits.
Structs§
- BFloat16
- BFloat16 — Google Brain’s 16-bit floating-point format.
Functions§
- bf16_
dot - Dot product of two bf16 slices, accumulated in f32.
- bf16_
gemm - Mixed-precision GEMM: C = A * B with bf16 inputs and f32 accumulation. A is (m × k), B is (k × n), C is (m × n).
- bf16_
gemv - Matrix-vector multiply: y = A * x, with bf16 inputs and f32 accumulation. A is (rows × cols) row-major, x is (cols,), y is (rows,).
- bf16_
to_ f32_ slice - Convert a bf16 slice to f32.
- f32_
to_ bf16_ slice - Convert an f32 slice to bf16.