Expand description
MIR fusion passes and fused-op decomposition.
Pattern-matching fusion (FuseMatMulBiasAct, FuseSwiGLU, …) and
the inverse unfuse_fused_for_autodiff rewrite used before autodiff.
Re-exports§
pub use control_flow::LowerControlFlow;pub use control_flow::inline_if;pub use control_flow::inline_subgraph_into;pub use control_flow::inline_subgraph_into_outputs;pub use control_flow::unroll_while;pub use fk_fusion::DecomposeFusionRegions;pub use fk_fusion::FuseBatchPreprocess;pub use fk_fusion::FuseRegionPrologue;pub use fk_fusion::MarkBatchSliceRegions;pub use fk_fusion::MarkTransformRegions;pub use fk_graphs::batch_narrow_relu_primitive_graph;pub use fk_graphs::batch_narrow_relu_regions_graph;pub use fk_graphs::nchw;pub use fk_graphs::resize_relu_graph;pub use fk_graphs::resize_relu_region_graph;pub use fusion::FuseAttentionBlock;pub use fusion::FuseMatMulBiasAct;pub use fusion::FuseResidualLN;pub use fusion::FuseResidualRmsNorm;pub use fusion::FuseRmsNormReshape;pub use fusion::FuseSwiGLU;pub use fusion::FuseSwiGLUDualMatmul;pub use fusion::FuseTransformerLayer;pub use fusion::MarkElementwiseRegions;pub use fusion::UnfuseElementwiseRegions;pub use fusion::clip_elementwise_regions;pub use fusion_fragment::FusionFragment;pub use fusion_fragment::FusionRole;pub use fusion_fragment::fusion_fragments;pub use fusion_fragment::is_registered_transform_op;pub use fusion_fragment::prologue_for_transform_op;pub use fusion_fragment::register_fusion_fragment;pub use fusion_fragment::transform_chain_eligible;pub use fusion_report::FusionReport;pub use fusion_report::MissReason;pub use fusion_report::MissedFusion;pub use limits::FusionLimits;pub use limits::active_fusion_limits;pub use limits::with_fusion_limits;pub use lower_backward_ops::LowerBackwardOps;pub use lower_dot_general::LowerDotGeneral;pub use lower_logical_kernels::lower_logical_kernels;pub use lower_loss_ops::LowerSoftmaxCrossEntropy;pub use lower_reduce_axes::LowerNonLastAxisReduce;pub use lower_vae_ops::LowerBatchNormInference;pub use lower_vae_ops::LowerGroupNorm;pub use lower_vae_ops::LowerResizeNearest2x;pub use pass::Pass;pub use pass::run_passes;pub use unfuse::unfuse_fused_for_autodiff;
Modules§
- control_
flow - Control-flow lowering passes:
Op::If→Where+ inlined branches;Op::While→ bounded unroll of body replicas. - fk_
fusion - FKL-inspired transform / prologue / batch fusion passes.
- fk_
graphs - Shared FKL-style benchmark / test graphs.
- fusion
- Fusion passes — pattern-match and replace subgraphs with fused ops.
- fusion_
fragment - FKL-style fusion fragment registry - extensible op roles for region passes.
- fusion_
report - Fusion diagnostics — what fused, what missed, and why.
- graph_
rewrite - Shared graph rewriter for fusion passes.
- limits
- Per-backend caps for fused IR (elementwise region chains, etc.).
- lower_
backward_ ops - Lower dedicated backward ops (
ReluBackward,ActivationBackward) to primitives (Compare,Where,Binary,Activation) for backends that do not implement closed-form gradient kernels (e.g. Metal). - lower_
dot_ general - Lower
Op::DotGeneralto primitive ops (MatMul + Transpose + Reshape). - lower_
logical_ kernels - Lower logical kernels to common IR when native backend ops are unavailable.
- lower_
loss_ ops - Lower
SoftmaxCrossEntropyWithLogits/SoftmaxCrossEntropyBackwardto primitives for backends (CUDA, Metal) that lack native kernels. - lower_
reduce_ axes - Lower
Op::Reduceon non-last axes (and multi-axis reduce) for backends that only implement reduction along the trailing dimension (e.g. wgpu). - lower_
vae_ ops - Lower VAE-specific ops (
GroupNorm,BatchNormInference,ResizeNearest2x) to primitives. - pass
- Pass infrastructure — trait + pipeline runner.
- unfuse
- Decompose tier-2 fused MIR ops into primitives for autodiff and backends.