Module bfloat16

Expand description

BFloat16 (bf16) floating-point support

Implements the Google Brain bfloat16 format used extensively in ML training. BF16 has the same exponent range as f32 (8 bits) but reduced mantissa (7 bits), making it ideal for training where range matters more than precision.

Layout: 1 sign bit, 8 exponent bits, 7 mantissa bits. Range: same as f32 (±3.4×10³⁸), precision: ~2 decimal digits.

Structs§

BFloat16: BFloat16 — Google Brain’s 16-bit floating-point format.

Functions§

bf16_dot: Dot product of two bf16 slices, accumulated in f32.
bf16_gemm: Mixed-precision GEMM: C = A * B with bf16 inputs and f32 accumulation. A is (m × k), B is (k × n), C is (m × n).
bf16_gemv: Matrix-vector multiply: y = A * x, with bf16 inputs and f32 accumulation. A is (rows × cols) row-major, x is (cols,), y is (rows,).
bf16_to_f32_slice: Convert a bf16 slice to f32.
f32_to_bf16_slice: Convert an f32 slice to bf16.

Module bfloat16

Module bfloat16 Copy item path

Structs§

Functions§

Module bfloat16