Module runners

Module runners 

Source
Expand description

Model runners for different architectures with sparse inference support

Structs§

BertEmbeddings
BertLayer
BertModel
GatedConv1d
GroupedQueryAttention
LFM2Layer
LFM2Model
LlamaAttention
LlamaLayer
LlamaMLP
LlamaModel
Llama model for sparse inference
LowRankPredictor
Low-rank predictor for neuron activation prediction
MultiHeadAttention
Pooler
SparseFfn

Enums§

SparseModel

Traits§

ModelRunner
Trait for running inference on models