tokstream

中文 | English

A token streaming simulator powered by Hugging Face tokenizers. It downloads a tokenizer from HF Hub and generates tokens at a target rate, with live stats for target vs actual throughput.

Highlights

Rust CLI with high‑precision pacing (sleep + spin)
Web demo (WASM) and npx executable
Random English / Chinese generation and text replay
Configurable filtering strategy
Target vs actual tokens/sec stats
Workspace layout with reusable core

Project Layout

.
├── crates
│   ├── tokstream-core   # tokenizer engine
│   ├── tokstream-cli    # Rust CLI
│   └── tokstream-wasm   # wasm-bindgen bindings
├── npm                  # npx CLI + web demo
├── bin                  # npm bin entry
├── Cargo.toml           # workspace
├── justfile
├── package.json
├── README.md
└── README_ZH.md

Rust CLI

Quick Start

cargo run -p tokstream-cli -- --model gpt2 --mode english --rate 8
cargo run -p tokstream-cli -- --model gpt2 --mode chinese --rate 8
cargo run -p tokstream-cli -- --model gpt2 --mode text --text "Hello" --repeat 3

Install from crates.io

cargo install tokstream-cli
# or
cargo binstall tokstream-cli

Notes:

The binary name is tokstream after installation.
cargo binstall will compile from source unless you provide prebuilt release assets and set repository in the crate metadata.

Model & Auth

--model <id> HF Hub model id (default: gpt2)
--revision <rev> HF revision (default: main)
--hf-token <token> access token for private models

Modes

--mode <english|chinese|text>
--text <text> text mode input
--text-file <path> text mode input from file
--loop-text loop text forever
--repeat <n> repeat text n times

Rate Control

--rate <n> target tokens/sec
--rate-min <n> min rate for random range
--rate-max <n> max rate for random range
--rate-sample-interval <n> sampling interval for rate range (seconds, default: 1)
--batch <n> tokens emitted per batch
--max-tokens <n> stop after n tokens

Pacing & Throughput

--pace <strict|sleep> pacing mode (default: strict)
--spin-threshold-us <n> busy‑spin threshold for strict mode
--no-throttle disable pacing (measure max throughput)
--no-output disable stdout output (closer to tokenizer upper bound)

Stats

--no-stats disable stats output (stderr)
--stats-interval <n> stats interval seconds (default: 1)

Random Output Filters

--no-skip-special do not skip special tokens
--allow-digits
--allow-punct
--allow-space
--allow-non-ascii
--no-require-letter
--no-require-cjk

Seed

--seed <n> random seed

Examples

# Random rate range sampled every 2 seconds
cargo run -p tokstream-cli -- --model gpt2 --mode english --rate-min 6 --rate-max 12 --rate-sample-interval 2

# Text mode from file, repeat 5 times
cargo run -p tokstream-cli -- --model gpt2 --mode text --text-file ./sample.txt --repeat 5

# Infinite loop text
cargo run -p tokstream-cli -- --model gpt2 --mode text --text "Hello" --loop-text

# Throughput upper bound (no throttle, no output)
cargo run -p tokstream-cli -- --model gpt2 --mode english --no-throttle --no-output

npx CLI

Quick Start

npx tokstream@latest --model gpt2 --mode english --rate 8
npx tokstream@latest --web --port 8787

For local development in this repo:

npx . --model gpt2 --mode english --rate 8

Supported Flags (npx)

--model <id>
--revision <rev>
--hf-token <token> (or env HF_TOKEN / HUGGINGFACE_HUB_TOKEN)
--mode <english|chinese|text>
--text <text>
--loop (loop text forever)
--repeat <n>
--rate <n>
--rate-min <n> / --rate-max <n>
--rate-sample-interval <n>
--seed <n>
--max-tokens <n>
--no-skip-special
--allow-digits / --allow-punct / --allow-space / --allow-non-ascii
--no-require-letter / --no-require-cjk
--no-stats / --stats-interval <n>
--no-throttle / --no-output
--web --port <n>

Notes:

--loop-text, --text-file, --batch, --pace, and --spin-threshold-us are Rust‑CLI only.

Web Demo

npx tokstream@latest --web --port 8787
# open http://localhost:8787

While running, you can drag the rate slider or enable random rate range. The page shows target and actual throughput. The output pane is fixed‑height and scrolls independently.

Accuracy Notes

Rust CLI strict uses sleep + short spin for high precision.
Web / npx are best‑effort due to event loop and I/O limits.
If actual throughput doesn’t change while raising target rates, you likely hit tokenizer limits.
For maximum throughput testing, use the Rust CLI with --no-output --no-throttle.

Build WASM (optional refresh)

npm run build:wasm

WASM artifacts are committed and included in the npm package.

just Recipes

just

Tests

cargo clippy --workspace
cargo nextest run --workspace

License

MIT

tokstream-core 0.1.2

tokstream

Highlights

Project Layout

Rust CLI

Quick Start

Install from crates.io

Model & Auth

Modes

Rate Control

Pacing & Throughput

Stats

Random Output Filters

Seed

Examples

npx CLI

Quick Start

Supported Flags (npx)

Web Demo

Accuracy Notes

Build WASM (optional refresh)

just Recipes

Tests

License