pub enum TensorOp {
MulMat,
Add,
RmsNorm,
Rope,
SoftMax,
Mul,
Silu,
Copy,
None,
}Expand description
Tensor operation types for decoder inference.
Each variant maps to ONE kernel dispatch. The goal is to express an entire transformer layer as ~5 operations:
- RmsNorm (pre-attention)
- QKV+Attention (fused projection + attention + output projection)
- Residual add
- RmsNorm (pre-FFN)
- FFN (gate+up+swiglu+down fused)
- Residual add
Variants§
MulMat
Matrix-vector multiply (Q4K dequant+GEMV or cuBLASLt GEMM)
Add
Element-wise add (residual connections)
RmsNorm
RMS normalization
Rope
Rotary position embedding
SoftMax
Softmax (attention scores)
Mul
Element-wise multiply (SwiGLU gate)
Silu
SiLU activation
Copy
Memory copy (KV cache scatter)
None
No-op (input tensor, leaf node)
Trait Implementations§
impl Copy for TensorOp
impl Eq for TensorOp
impl StructuralPartialEq for TensorOp
Auto Trait Implementations§
impl Freeze for TensorOp
impl RefUnwindSafe for TensorOp
impl Send for TensorOp
impl Sync for TensorOp
impl Unpin for TensorOp
impl UnsafeUnpin for TensorOp
impl UnwindSafe for TensorOp
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more