Expand description
Kernel Fusion Engine
Automatically detects and fuses element-wise / pointwise kernel sequences to eliminate intermediate memory allocations and round-trips. This mirrors the kernel fusion passes in TensorRT, XLA, and TVM.
Fusion rules:
- Element-wise ops (add, mul, relu, etc.) can always fuse.
- Reduction followed by broadcast can fuse (vertical fusion).
- Producer-consumer pairs with matching shapes can fuse (horizontal).
Structs§
- Fused
Kernel - A fused kernel — a sequence of operations executed as one kernel.
- Fusion
Analyzer - Fusion analysis engine that detects fusable patterns.
- Fusion
Node - A node in the fusion graph.
- Fusion
Result - Result of fusion analysis.