Struct IMoELayer

Source

pub struct IMoELayer { /* private fields */ }

Expand description

┌──────────────┐┌────────────────────────┐┌────────────────────────┐
│ hiddenStates ││selectedExpertsForTokens││scoresForSelectedExperts│
└──────────────┘└────────────────────────┘└────────────────────────┘
│                    │                    │
│                    │                    │
┌───────────────────────────────────────────────────────────────────────────────────┐
│                                                                                   │
│  ┌──────────────────────────┐                        ┌──────────────────────────┐ │
│  │      │  Expert 0   │     │         MOE            │      │  Expert i   │     │ │
│  │      │             │     │                        │      │             │     │ │
│  │  ┌────────┐    ┌────────┐│                        │  ┌────────┐    ┌────────┐│ │
│  │  │ fcGate │    │  fcUp  ││                        │  │ fcGate │    │  fcUp  ││ │
│  │  │        │    │        ││                        │  │        │    │        ││ │
│  │  └───┬────┘    └────┬───┘│                        │  └───┬────┘    └────┬───┘│ │
│  │      │              │    │                        │      │              │    │ │
│  │ ┌──────────┐        │    │                        │ ┌──────────┐        │    │ │
│  │ │activation│        │    │                        │ │activation│        │    │ │
│  │ └────┬─────┘        │    │                        │ └────┬─────┘        │    │ │
│  │      │              │    │       .......          │      │              │    │ │
│  │      └──────┬───────┘    │                        │      └──────┬───────┘    │ │
│  │             │            │                        │             │            │ │
│  │         ┌────────┐       │                        │         ┌────────┐       │ │
│  │         │  mul   │       │                        │         │  mul   │       │ │
│  │         └───┬────┘       │                        │         └───┬────┘       │ │
│  │             │            │                        │             │            │ │
│  │         ┌───▼────┐       │                        │         ┌───▼────┐       │ │
│  │         │ fcDown │       │                        │         │ fcDown │       │ │
│  │         └───┬────┘       │                        │         └───┬────┘       │ │
│  │             │            │                        │             │            │ │
│  │         ┌───▼────┐       │                        │         ┌───▼────┐       │ │
│  │         │output 0│       │                        │         │output i│       │ │
│  │         └───┬────┘       │                        │         └───┬────┘       │ │
│  └─────────────┼────────────┘                        └─────────────┼────────────┘ │
│                │                                                   │              │
│                └───────────────────┬───────────────────────────────┘              │
│                                    │                                              │
│                                    ▼                                              │
│                            ┌───────────────┐                                      │
│                            │  weightedSum  │                                      │
│                            └───────┬───────┘                                      │
└────────────────────────────────────│──────────────────────────────────────────────┘
▼
┌───────────────┐
│   moeOutput   │
└───────────────┘

IMoELayer

A MoE layer in a network definition. Mixture of Experts (MoE) is a collection of experts with each expert specializing in processing different subsets of input data. The key innovation lies in using a Router that selectively activates only the specific experts needed for a given input, rather than engaging the entire neural network for every task.

Definition in the MoE layer: fcDown, fcGate, fcUp are three linear layers. fc(x) = x * w + b, where x is the input, w is the weight, b is the bias, * is the matrix multiplication. activation is the activation function. mul is the multiplication between the output of fc_up and the output of fc_gate. weightedSum is the weighted sum of the output of the experts. moeOutput is the output of the MoE layer.

MoE is a collection of experts. Each expert is a GLU (gated linear unit), which consists by fcGate, fcUp, fcDown, activation, mul.

Definitions and Abbreviations: batchSize: batch size seqLen: sequence length hiddenSize: the size of the hidden states numExperts: the number of experts in the MoE layer moeInterSize: the intermediate size of the MoE layer topK: the number of experts to select for each token

This layer takes several activation inputs:

hiddenStates: the hidden states of the layer, with shape [batchSize, seqLen, hiddenSize]
selectedExpertsForTokens: the top K experts selected for each token, with shape [batchSize, seqLen, topK]
scoresForSelectedExperts: the scales for the selected experts per token, with shape [batchSize, seqLen, topK] The MoE will take the selected experts and the corresponding scales for the selected experts to compute the output.

The weights in the MoE layer:

fcGateWeights with shape [numExperts, hiddenSize, moeInterSize]: the weight matrix for fcGate
fcUpWeights with shape [numExperts, hiddenSize, moeInterSize]: the weight matrix for fcUp
fcDownWeights with shape [numExperts, moeInterSize, hiddenSize]: the weight matrix for fcDown

Several optional inputs are supported:

fcGateBias: the bias for the fcGate, with shape [numExperts, moeInterSize]
fcUpBias: the bias for the fcUp, with shape [numExperts, moeInterSize]
fcDownBias: the bias for the fcDown, with shape [numExperts, hiddenSize] All the bias are none by default. You must either set all the bias or none of them.
activation: the activation type for the MoE layer, currently only support SILU.

MoE computation process description: For each token, the MoE layer computation process is as follows:

Input processing:

Receive hiddenStates:
Receive selectedExpertsForTokens:
Receive scoresForSelectedExperts:

Expert computation for each token:

output_i = fcDown(fcUp(hiddenStates) * activation(fcGate(hiddenStates)))

Expert output aggregation: For each token, firstly select all the experts that need to be activated to do the computation.

calculate the selected expert’s output according to expert id in selectedExpertsForTokens for each token
Weighted sum of each expert’s output according to weights in scoresForSelectedExperts for each token
Final output for the token: moeOutput = Σ(score_i * output_i) The output of MoE has the same shape as the input hiddenStates.

MoE requires Blackwell or Thor GPU architecture (SM 10.x or SM 11.x). SM 12.x is not currently supported. And performance is limited when seqLen > 16.

Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

IMoELayer

Struct IMoELayer Copy item path

Implementations§

impl IMoELayer

pub fn setGatedWeights( self: Pin<&mut IMoELayer>, fcGateWeights: Pin<&mut ITensor>, fcUpWeights: Pin<&mut ITensor>, fcDownWeights: Pin<&mut ITensor>, activationType: MoEActType, )

pub fn setGatedBiases( self: Pin<&mut IMoELayer>, fcGateBiases: Pin<&mut ITensor>, fcUpBiases: Pin<&mut ITensor>, fcDownBiases: Pin<&mut ITensor>, )

pub fn setActivationType(self: Pin<&mut IMoELayer>, activationType: MoEActType)

pub fn getActivationType(self: &IMoELayer) -> MoEActType

pub fn setQuantizationStatic( self: Pin<&mut IMoELayer>, fcDownActivationScale: Pin<&mut ITensor>, dataType: DataType, )

pub fn setQuantizationDynamicDblQ( self: Pin<&mut IMoELayer>, fcDownActivationDblQScale: Pin<&mut ITensor>, dataType: DataType, blockShape: &Dims64, dynQOutputScaleType: DataType, )

pub fn setQuantizationToType(self: Pin<&mut IMoELayer>, type_: DataType)

pub fn getQuantizationToType(self: &IMoELayer) -> DataType

pub fn setQuantizationBlockShape(self: Pin<&mut IMoELayer>, blockShape: &Dims64)

pub fn getQuantizationBlockShape(self: &IMoELayer) -> Dims64

pub fn setDynQOutputScaleType(self: Pin<&mut IMoELayer>, type_: DataType)

pub fn getDynQOutputScaleType(self: &IMoELayer) -> DataType

pub fn setSwigluParams( self: Pin<&mut IMoELayer>, limit: f32, alpha: f32, beta: f32, )

pub fn setSwigluParamLimit(self: Pin<&mut IMoELayer>, limit: f32)

pub fn getSwigluParamLimit(self: &IMoELayer) -> f32

pub fn setSwigluParamAlpha(self: Pin<&mut IMoELayer>, alpha: f32)

pub fn getSwigluParamAlpha(self: &IMoELayer) -> f32

pub fn setSwigluParamBeta(self: Pin<&mut IMoELayer>, beta: f32)

pub fn getSwigluParamBeta(self: &IMoELayer) -> f32

pub fn setInput( self: Pin<&mut IMoELayer>, index: i32, tensor: Pin<&mut ITensor>, )

Trait Implementations§

impl AsLayer for IMoELayer

fn as_layer(&self) -> &ILayer

fn as_layer_pin_mut(&mut self) -> Pin<&mut ILayer>

impl AsLayerTyped for IMoELayer

const TYPE: LayerType = LayerType::kMOE

impl AsRef<ILayer> for IMoELayer

fn as_ref(self: &IMoELayer) -> &ILayer

impl ExternType for IMoELayer

type Id = (n, v, i, n, f, e, r, _1, (), I, M, o, E, L, a, y, e, r)

type Kind = Opaque

impl MakeCppStorage for IMoELayer

unsafe fn allocate_uninitialized_cpp_storage() -> *mut IMoELayer

unsafe fn free_uninitialized_cpp_storage(arg0: *mut IMoELayer)

Auto Trait Implementations§

impl !Freeze for IMoELayer

impl !RefUnwindSafe for IMoELayer

impl !Send for IMoELayer

impl !Sync for IMoELayer

impl Unpin for IMoELayer

impl UnsafeUnpin for IMoELayer

impl UnwindSafe for IMoELayer

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct IMoELayer

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,