Struct FuseTransformerLayer

Source

pub struct FuseTransformerLayer;

Expand description

Fuses an entire BERT-style transformer layer (attention block + residual+LN + FFN + residual+LN) into one Op::FusedTransformerLayer node.

Pattern (after FuseMatMulBiasAct, FuseResidualLN, and FuseAttentionBlock have run — order matters):

  skip ──┬─→ FusedAttentionBlock(qkv_w, out_w, mask, qkv_b, out_b) ─→ attn_out
         └─→ FusedResidualLN(attn_out, skip, ln1_g, ln1_b) ─→ h1
                                                               ├─→ FusedMatMulBiasAct(fc1_w, fc1_b, GeLU) ─→ ffn_int
                                                               │                                              ↓
                                                               │           FusedMatMulBiasAct(fc2_w, fc2_b, None) ─→ ffn_out
                                                               └────────────────────→ FusedResidualLN(ffn_out, h1, ln2_g, ln2_b) ─→ out

All five nodes collapse into a single FusedTransformerLayer { num_heads, head_dim, intermediate_size, eps1, eps2, activation, has_bias: true } with the 14-input layout consumed by rlx-mlx’s lowering at rlx-mlx/src/lower.rs:1528: [hidden, qkv_w, qkv_b, out_w, out_b, ln1_g, ln1_b, fc1_w, fc1_b, fc2_w, fc2_b, ln2_g, ln2_b, mask].

Threshold is the same as FuseAttentionBlock (RLX_FUSE_ATTN_THRESHOLD, default 64). Backends that don’t natively support FusedTransformerLayer un-fuse it back to primitives at compile time; backends that do (MLX) can emit one monolithic kernel per layer.

FuseTransformerLayer

Struct FuseTransformerLayer Copy item path

Trait Implementations§

impl Pass for FuseTransformerLayer

fn name(&self) -> &str

fn run(&self, graph: Graph) -> Graph

Auto Trait Implementations§

impl Freeze for FuseTransformerLayer

impl RefUnwindSafe for FuseTransformerLayer

impl Send for FuseTransformerLayer

impl Sync for FuseTransformerLayer

impl Unpin for FuseTransformerLayer

impl UnsafeUnpin for FuseTransformerLayer

impl UnwindSafe for FuseTransformerLayer

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct FuseTransformerLayer

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,