Skip to main content

Crate large

Crate large 

Source
Expand description

LARGE — Lightweight Architecture for Running Generative Engines.

An educational, from-scratch LLM inference engine written in Rust, targeting CPU inference on Qwen3-0.6B using the GGUF model format.

§Modules

  • gguf — GGUF file format parser with memory-mapped tensor access
  • tensor — Dequantization and math operations (mat-vec, RMSNorm, RoPE, etc.)
  • tokenizer — GPT-2 style byte-level BPE tokenizer
  • model — Qwen3 transformer model (GQA, SwiGLU, KV cache)
  • sampler — Token sampling strategies (greedy, temperature, top-p)

Modules§

gguf
GGUF file format parser.
model
Qwen3 transformer model for inference.
sampler
Token sampling strategies for autoregressive generation.
tensor
Tensor operations for LLM inference.
tokenizer
BPE tokenizer for Qwen3 models.