codec-rs
Isomorphic tokenizer + detokenizer for the Codec binary transport protocol — for Rust.
Decodes streaming token IDs from Codec-compliant servers (vLLM, SGLang) and encodes text into IDs for the bidirectional path. Pure Rust, with optional reqwest for the map loader and optional tokio for async stream decoding.
The functional twin of @codecai/web (browser/Node), codecai (Python), and Codec.Net (.NET). Same tokenizer dialect maps work everywhere.
Install
# Cargo.toml
[]
= "0.1"
# or, with async streams + map loader:
= { = "0.1", = ["tokio", "http"] }
# async:
Edition 2021. Stable Rust 1.75+. The http feature is on by default; disable it (default-features = false) if you want the core types without reqwest.
Quick start — decode a stream (sync)
use ;
#
Forwarding IDs to another model (agent-to-agent, same vocab)
When the next consumer of this stream is another model on the same vocab — agent → agent, orchestrator → planner, model → tool that re-feeds the model — you do NOT need a Detokenizer at all. Forward frame.ids directly:
// No Detokenizer constructed: zero UTF-8 reassembly, zero BPE-merge work.
for frame in decode_msgpack_stream
This is the hot-loop fast path for agent mesh code. Skipping detok.render(...) saves ~10-20% client CPU on heavy reply streams (no String allocation, no partial-UTF-8 buffering, no metaspace decode). For cross-vocab handoff use Translator — that case still needs the byte-level path because the two vocabs disagree.
Quick start — decode a stream (async)
Behind the tokio feature flag, the same logic runs over tokio::io::AsyncRead:
use r#decode_msgpack_stream_async;
use ;
use StreamExt;
async
Quick start — encode text (bidirectional path)
When you want zero text on the wire in either direction — agent A's output IDs feeding straight into agent B's input — encode text to IDs locally before sending:
use ;
let map = from_json?;
let tok = pick;
let prompt_ids: = tok.encode;
// Send IDs as a normal OpenAI prompt: int[] (no special endpoint needed).
let body = json!;
For huge prompts (>50K tokens, e.g. RAG with long context), the dedicated /v1/completions/codec endpoint accepts a binary msgpack request body with the same effect. See PROTOCOL.md for both paths.
API
| Type | Purpose |
|---|---|
MapLoader::load_blocking(opts) / MapLoader::load(opts).await |
Fetch + sha256-verify + cache a dialect map |
discover_zstd_dict_blocking(origin, hash) / discover_zstd_dict(origin, hash).await (v0.5) |
Resolve a zstd dict at .well-known/codec/dicts/<sha256-hex>.zstd. Hash-pin-verified against the URL's path component; hard-fails on 404 / mismatch (no silent fallback). well_known_dict_url builds the URL on its own for callers using a custom HTTP stack. |
MemoryMapCache / MapCache trait |
Default in-memory cache; implement for Redis / disk / IDB |
TokenizerMap::from_json(...) / TokenizerMap::validate(...) |
Parse + schema check |
TokenizerMap::verify_sha256(bytes, expected) |
Standalone sha256 verify |
Detokenizer |
Stateful: byte_level + metaspace + byte fallback + partial UTF-8 |
Detokenizer::detokenize(map, ids, render_special) |
One-shot for non-streaming use |
BPETokenizer |
Pure-Rust BPE: byte_level + metaspace |
LongestMatchTokenizer |
Vocab-only fallback for canonical-IR maps |
Tokenize::pick(&map) |
Build the right tokenizer for the loaded map |
Tokenize::encode(&map, text) |
One-shot helper |
decode_msgpack_stream(read) |
Read → Iterator<Item = Result<CodecFrame>> |
decode_protobuf_stream(read) |
Same for length-prefixed protobuf |
decode_protobuf_frame(span) |
One-shot frame decoder (no length prefix) |
stream::async::decode_msgpack_stream_async(...) |
Async variant (feature tokio), returns Stream<Item = Result<CodecFrame>> |
stream::async::decode_protobuf_stream_async(...) |
Same for protobuf |
ToolWatcher |
Detect delimited regions (tool calls, reasoning blocks, vision spans) without decoding |
Translator / translate_one_shot / static_translation_table |
Cross-vocab agent handoff: ids_A → text → ids_B with streaming-safe word-boundary buffering |
Detect tool calls without decoding
Most chat-tuned models delimit tool calls with single-token specials (Qwen <tool_call>/</tool_call>, Llama 3.1+ <|python_tag|>/<|eom_id|>, DeepSeek-R1 <think>/</think>, …). Detecting one is a u32 compare in the hot loop — no detokenize, no string allocation:
use ;
let mut watcher = new?;
for frame in decode_msgpack_stream
Stateful — regions split between network frames buffer until the end marker arrives. Same primitive covers reasoning blocks, multimodal spans, code-interpreter regions — anything delimited by a (start, end) special pair.
Cross-vocab agent handoff
When agent A's output feeds agent B as a prompt and the two models have different vocabs, decode-then-reencode through text — without ever putting text on the wire:
use Translator;
let mut tr = new;
for frame in decode_msgpack_stream
// tr.finish() drains the trailing partial-word buffer.
let drain = tr.finish;
Pre-tokenizers split at whitespace, so Translator buffers partial words until a safe boundary arrives. For analysis-only use, static_translation_table(&from, &to) gives a context-free id_A → ids_B lookup.
Correctness
- Byte-level decode: every vocab token is a sequence of GPT-2-encoded bytes. The Detokenizer reverses the byte→unicode table and accumulates bytes across tokens until a complete UTF-8 sequence forms. Tested with 3-byte (
€) and 4-byte (🚀) sequences. - Metaspace decode:
▁becomes space; SentencePiece byte-fallback IDs (<0x00>–<0xFF>) decoded through the same UTF-8 buffer. - Partial sequences across frames:
Detokenizeris stateful — callrender(ids, DetokenizeOptions { partial: true, .. })while frames stream, thenpartial: falseon the last frame so the buffer flushes.reset()between conversations. - BPE merge ordering: greedy by priority, not left-to-right. Matches HuggingFace tokenizers reference behavior. Test fixture verifies this explicitly (
tests/bpe_tests.rs::merges_greedily_by_priority_not_left_to_right). - Hash verification uses
sha2::Sha256. Mismatch returnsLoadError::HashMismatch(no panic).
Map sources
MapLoader::load_blocking / MapLoader::load accept any URL — the sha256 hash is what matters. For curated pre-generated maps:
https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/<family>.json
14 families covering 70+ aliases — see codec-maps for the index.
To generate from a HuggingFace tokenizer.json:
Compression
MapLoader enables transparent decompression for gzip and brotli on its reqwest client, so jsDelivr's Content-Encoding: br (3.4× smaller transfers) works out of the box. For Codec streaming responses, the server negotiates Content-Encoding based on the request's Accept-Encoding; have your reqwest client request Accept-Encoding: zstd, br, gzip and the response stream is decompressed before any decoder sees it.
Feature flags
| Feature | Default | What it enables |
|---|---|---|
http |
yes | MapLoader (sync + async) via reqwest with rustls |
tokio |
no | stream::async::* async-stream decoders for any tokio::io::AsyncRead |
Disable defaults to drop reqwest:
= { = "0.1", = false }
License
MIT. See LICENSE.