Skip to main content

FuseAttentionBlock

Struct FuseAttentionBlock 

Source
pub struct FuseAttentionBlock;
Expand description

Fuses matmul(QKV) → narrow(Q,K,V) → [rope] → attention → matmul(out) into a single FusedAttentionBlock when batch*seq is small.

The optimizer auto-detects batch size from graph input shapes. For small inputs (batch*seq ≤ 64), intermediate tensors fit in L1 cache, making a monolithic kernel faster than separate BLAS calls.

Threshold is configurable via RLX_FUSE_ATTN_THRESHOLD (default: 64).

Trait Implementations§

Source§

impl Pass for FuseAttentionBlock

Source§

fn name(&self) -> &str

Human-readable name for logging.
Source§

fn run(&self, graph: Graph) -> Graph

Transform the graph. Returns a new graph (or the same if no changes).

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.