eld_llm 0.0.1

An LLM built from scratch in Rust
# BPE Development Benchmarks

## Shakespeare (~1 MB)

| Version | Corpus size (bytes) | Training time (ns) | Peak memory (MB) | Vocab size | Avg token length | Max token length | Min token length | Avg encode time (ns) | Avg decode time (ns) | Encode throughput (tokens/sec) | Decode throughput (tokens/sec) | Corpus token count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
||||||||||||||

## Gutenberg sampler (~10 MB)

| Version | Corpus size (bytes) | Training time (ns) | Peak memory (MB) | Vocab size | Avg token length | Max token length | Min token length | Avg encode time (ns) | Avg decode time (ns) | Encode throughput (tokens/sec) | Decode throughput (tokens/sec) | Corpus token count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
||||||||||||||

## Wikipedia 1% (~100 MB)

| Version | Corpus size (bytes) | Training time (ns) | Peak memory (MB) | Vocab size | Avg token length | Max token length | Min token length | Avg encode time (ns) | Avg decode time (ns) | Encode throughput (tokens/sec) | Decode throughput (tokens/sec) | Corpus token count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
||||||||||||||

## OpenWebText sample (~1 GB)

| Version | Corpus size (bytes) | Training time (ns) | Peak memory (MB) | Vocab size | Avg token length | Max token length | Min token length | Avg encode time (ns) | Avg decode time (ns) | Encode throughput (tokens/sec) | Decode throughput (tokens/sec) | Corpus token count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
||||||||||||||