Expand description
Precision selection for graph execution.
Each backend can compile a graph at f32 (default — accurate) or f16 (half precision — 2× peak FLOPs and ½ memory bandwidth on supported hardware). The IR remains dtype-agnostic; the backend decides how to materialize buffers and pick kernels.
Mixed precision: f16 inference typically keeps reductions (LayerNorm mean/var, attention softmax) in f32 to avoid catastrophic accuracy loss while keeping matmul + element-wise in f16.
Enums§
- Precision
- Numeric precision for graph compilation.