Expand description
LARGE — Lightweight Architecture for Running Generative Engines.
An educational, from-scratch LLM inference engine written in Rust, targeting CPU inference on Qwen3-0.6B using the GGUF model format.
§Modules
gguf— GGUF file format parser with memory-mapped tensor accesstensor— Dequantization and math operations (mat-vec, RMSNorm, RoPE, etc.)tokenizer— GPT-2 style byte-level BPE tokenizermodel— Qwen3 transformer model (GQA, SwiGLU, KV cache)sampler— Token sampling strategies (greedy, temperature, top-p)