eld_llm 0.0.1 - Docs.rs

# BPE Development Benchmarks

## Shakespeare (~1 MB)

| Version | Corpus size (bytes) | Training time (ns) | Peak memory (MB) | Vocab size | Avg token length | Max token length | Min token length | Avg encode time (ns) | Avg decode time (ns) | Encode throughput (tokens/sec) | Decode throughput (tokens/sec) | Corpus token count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| — | — | — | — | — | — | — | — | — | — | — | — | — |

## Gutenberg sampler (~10 MB)

| Version | Corpus size (bytes) | Training time (ns) | Peak memory (MB) | Vocab size | Avg token length | Max token length | Min token length | Avg encode time (ns) | Avg decode time (ns) | Encode throughput (tokens/sec) | Decode throughput (tokens/sec) | Corpus token count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| — | — | — | — | — | — | — | — | — | — | — | — | — |

## Wikipedia 1% (~100 MB)

| Version | Corpus size (bytes) | Training time (ns) | Peak memory (MB) | Vocab size | Avg token length | Max token length | Min token length | Avg encode time (ns) | Avg decode time (ns) | Encode throughput (tokens/sec) | Decode throughput (tokens/sec) | Corpus token count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| — | — | — | — | — | — | — | — | — | — | — | — | — |

## OpenWebText sample (~1 GB)

| Version | Corpus size (bytes) | Training time (ns) | Peak memory (MB) | Vocab size | Avg token length | Max token length | Min token length | Avg encode time (ns) | Avg decode time (ns) | Encode throughput (tokens/sec) | Decode throughput (tokens/sec) | Corpus token count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| — | — | — | — | — | — | — | — | — | — | — | — | — |