eld_llm 0.0.1 - Docs.rs

# Roadmap

This project follows the book **"Build a Large Language Model (From Scratch)"** by Sebastian Raschka as its primary guide, implemented in Rust.

---

## Goal 1 — Tokenization

Implement and benchmark different BPE variants before moving on. The goal is to understand the tradeoffs in tokenizer design and have a solid, tested implementation to carry forward.

### Milestones

- [ ] Naive BPE — character-level pair merging, no special tokens
- [ ] BPE with special tokens (`<|endoftext|>`, `<|unk|>`)
- [ ] BPE with pre-tokenization regex (GPT-2 style split before byte encoding)
- [ ] Byte-level BPE (no true unknowns, full UTF-8 coverage)
- [ ] Benchmark all variants against each other (see `docs/benchmarks/tokenizing/`)
- [ ] Pick one implementation to carry forward into training

---

## Goal 2 — Attention Mechanism

_Placeholder — to be defined after Goal 1 is complete._

---

## Goal 3 — GPT Model Architecture

_Placeholder — to be defined after Goal 2 is complete._

---

## Goal 4 — Pre-training

_Placeholder — to be defined after Goal 3 is complete._

---

## Goal 5 — Fine-tuning & Alignment

_Placeholder — to be defined after Goal 4 is complete._