Skip to main content

FuseTransformerLayer

Struct FuseTransformerLayer 

Source
pub struct FuseTransformerLayer;
Expand description

Fuses an entire BERT-style transformer layer (attention block + residual+LN + FFN + residual+LN) into one Op::FusedTransformerLayer node.

Pattern (after FuseMatMulBiasAct, FuseResidualLN, and FuseAttentionBlock have run — order matters):

  skip ──┬─→ FusedAttentionBlock(qkv_w, out_w, mask, qkv_b, out_b) ─→ attn_out
         └─→ FusedResidualLN(attn_out, skip, ln1_g, ln1_b) ─→ h1
                                                               ├─→ FusedMatMulBiasAct(fc1_w, fc1_b, GeLU) ─→ ffn_int
                                                               │                                              ↓
                                                               │           FusedMatMulBiasAct(fc2_w, fc2_b, None) ─→ ffn_out
                                                               └────────────────────→ FusedResidualLN(ffn_out, h1, ln2_g, ln2_b) ─→ out

All five nodes collapse into a single FusedTransformerLayer { num_heads, head_dim, intermediate_size, eps1, eps2, activation, has_bias: true } with the 14-input layout consumed by rlx-mlx’s lowering at rlx-mlx/src/lower.rs:1528: [hidden, qkv_w, qkv_b, out_w, out_b, ln1_g, ln1_b, fc1_w, fc1_b, fc2_w, fc2_b, ln2_g, ln2_b, mask].

Threshold is the same as FuseAttentionBlock (RLX_FUSE_ATTN_THRESHOLD, default 64). Backends that don’t natively support FusedTransformerLayer un-fuse it back to primitives at compile time; backends that do (MLX) can emit one monolithic kernel per layer.

Trait Implementations§

Source§

impl Pass for FuseTransformerLayer

Source§

fn name(&self) -> &str

Human-readable name for logging.
Source§

fn run(&self, graph: Graph) -> Graph

Transform the graph. Returns a new graph (or the same if no changes).

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.