Skip to main content

Crate large

Crate large

Expand description

LARGE — Lightweight Architecture for Running Generative Engines.

An educational, from-scratch LLM inference engine written in Rust, targeting CPU inference on Qwen3-0.6B using the GGUF model format.

§Modules

gguf — GGUF file format parser with memory-mapped tensor access
tensor — Dequantization and math operations (mat-vec, RMSNorm, RoPE, etc.)
tokenizer — GPT-2 style byte-level BPE tokenizer
model — Qwen3 transformer model (GQA, SwiGLU, KV cache)
sampler — Token sampling strategies (greedy, temperature, top-p)

Modules§

gguf: GGUF file format parser.
model: Qwen3 transformer model for inference.
sampler: Token sampling strategies for autoregressive generation.
tensor: Tensor operations for LLM inference.
tokenizer: BPE tokenizer for Qwen3 models.