# Planned Project Structure
```
src/
├── models/ # LLM model definitions and sizes
│ ├── architecture/ # transformer blocks, attention, FFN, embeddings
│ └── sizes/ # config presets: small, medium, large
│
├── training/ # training pipeline
│ ├── optimizer/ # AdamW, schedulers
│ └── loss/ # cross-entropy, perplexity
│
├── inference/ # inference engine
│ ├── sampler/ # greedy, top-k, top-p, temperature
│ └── cache/ # KV cache
│
├── data/ # data loading and preprocessing
│ ├── loader/ # file readers, streaming datasets
│ └── pipeline/ # batching, shuffling, tokenization wiring
│
├── tokenizers/ # tokenizer implementations
│ ├── benchmark/
│ └── versions/
│
├── utils/ # shared utilities
│ └── cli/ # CLI parsing (clap)
│
└── apps/ # end-user applications
├── agent/ # local LLM agent (REPL / tool use)
└── server/ # SSH server, HTTP API
```