Expand description
Model runners for different architectures with sparse inference support
Structs§
- Bert
Embeddings - Bert
Layer - Bert
Model - Gated
Conv1d - Grouped
Query Attention - LFM2
Layer - LFM2
Model - Llama
Attention - Llama
Layer - LlamaMLP
- Llama
Model - Llama model for sparse inference
- LowRank
Predictor - Low-rank predictor for neuron activation prediction
- Multi
Head Attention - Pooler
- Sparse
Ffn
Enums§
Traits§
- Model
Runner - Trait for running inference on models