Skip to main content

Crate axonml_llm

Crate axonml_llm 

Source
Expand description

Nine LLM architectures for the AxonML framework.

Complete pure-Rust implementations: GPT-2 (decoder-only), LLaMA (split- halves RoPE + GQA + SwiGLU), Mistral (sliding-window attention), Phi (partial RoPE + GELU), BERT (bidirectional encoder + classification/MLM), SSM/Mamba (selective S6 scan + depthwise conv + SSMForCausalLM), Hydra (hybrid SSM + windowed attention), Chimera (sparse MoE + differential attention), Trident (1.58-bit ternary TernaryLinear, RoPE + GQA + ReLU²-gated FFN + SubLN, graph-preserving RepeatKVBackward, configs for 1B/3B/smoke). Shared building blocks: attention, RMSNorm, RotaryEmbedding, embedding, text generation (top-k/top-p/temperature), HuggingFace weight loader, and pretrained model hub.

§File

crates/axonml-llm/src/lib.rs

§Author

Andrew Jewell Sr. — AutomataNexus LLC ORCID: 0009-0005-2158-7060

§Updated

April 14, 2026 11:15 PM EST

§Disclaimer

Use at own risk. This software is provided “as is”, without warranty of any kind, express or implied. The author and AutomataNexus shall not be held liable for any damages arising from the use of this software.

Re-exports§

pub use attention::CausalSelfAttention;
pub use attention::FlashAttention;
pub use attention::FlashAttentionConfig;
pub use attention::KVCache;
pub use attention::LayerKVCache;
pub use attention::MultiHeadSelfAttention;
pub use attention::scaled_dot_product_attention;
pub use bert::Bert;
pub use bert::BertForMaskedLM;
pub use bert::BertForSequenceClassification;
pub use chimera::ChimeraConfig;
pub use chimera::ChimeraModel;
pub use config::BertConfig;
pub use config::GPT2Config;
pub use config::TransformerConfig;
pub use embedding::BertEmbedding;
pub use embedding::GPT2Embedding;
pub use embedding::PositionalEmbedding;
pub use embedding::TokenEmbedding;
pub use error::LLMError;
pub use error::LLMResult;
pub use generation::GenerationConfig;
pub use generation::TextGenerator;
pub use gpt2::GPT2;
pub use gpt2::GPT2LMHead;
pub use hf_loader::HFLoader;
pub use hf_loader::load_llama_from_hf;
pub use hf_loader::load_mistral_from_hf;
pub use hub::PretrainedLLM;
pub use hub::download_weights as download_llm_weights;
pub use hub::llm_registry;
pub use hydra::HydraConfig;
pub use hydra::HydraModel;
pub use llama::LLaMA;
pub use llama::LLaMAConfig;
pub use llama::LLaMAForCausalLM;
pub use mistral::Mistral;
pub use mistral::MistralConfig;
pub use mistral::MistralForCausalLM;
pub use phi::Phi;
pub use phi::PhiConfig;
pub use phi::PhiForCausalLM;
pub use ssm::SSMBlock;
pub use ssm::SSMConfig;
pub use ssm::SSMForCausalLM;
pub use state_dict::LoadResult;
pub use state_dict::LoadStateDict;
pub use tokenizer::HFTokenizer;
pub use tokenizer::SpecialTokens;
pub use transformer::TransformerBlock;
pub use transformer::TransformerDecoder;
pub use transformer::TransformerEncoder;
pub use trident::TridentConfig;
pub use trident::TridentModel;

Modules§

attention
Attention Mechanisms Module
bert
BERT Model — Encoder-Only Transformer with Task Heads
chimera
Chimera - Mixture of Experts + Differential Attention Small Language Model
config
Model Configuration — Transformer, BERT, and GPT-2 Hyperparameters
embedding
Embedding Module — Token, Position, Segment, and Sinusoidal Embeddings
error
LLM Error Types — Failure Modes for Transformer Operations
generation
Text Generation — Decoding Strategies for Autoregressive LMs
gpt2
GPT-2 Model — Decoder-Only Transformer with LM Head
hf_loader
HuggingFace Model Loader
hub
LLM Model Hub - Pretrained Language Model Weights
hydra
Hydra - Hybrid SSM + Sparse Attention Small Language Model
llama
LLaMA - Large Language Model Meta AI
mistral
Mistral - Efficient LLM Architecture
phi
Phi - Microsoft’s Small Language Models
ssm
State Space Model (SSM) - Mamba-style Selective Scan
state_dict
State Dictionary Loading
tokenizer
HuggingFace Tokenizer Support
transformer
Transformer Building Blocks — Layer Norm, FFN, Encoder / Decoder Blocks
trident
Trident - 1.58-bit Ternary Weight Small Language Model