Module precision

Expand description

Precision selection for graph execution.

Each backend can compile a graph at f32 (default — accurate) or f16 (half precision — 2× peak FLOPs and ½ memory bandwidth on supported hardware). The IR remains dtype-agnostic; the backend decides how to materialize buffers and pick kernels.

Mixed precision: f16 inference typically keeps reductions (LayerNorm mean/var, attention softmax) in f32 to avoid catastrophic accuracy loss while keeping matmul + element-wise in f16.

Enums§

Precision: Numeric precision for graph compilation.

Module precision

Module precision Copy item path

Enums§

Module precision