Skip to main content

Module tokens

Module tokens 

Source
Expand description

Token efficiency: count tokens under popular agentic tokenizers and model the four cost terms an agent pays per task.

An agent’s total cost is not just the characters it types. It is: standing_context (the schema/cheatsheet it must carry to use the program, re-sent each turn) + input (the program it writes) + output (what it reads back) + retries (re-dos from ambiguity/failure). A representation that golfs input while inflating standing_context can be a net loss — so this module counts all four and amortizes over a session.

Structs§

AgentCost
The four token-cost terms an agent pays per task. All in tokens.
CacheReport
How much a representation benefits from API prompt-caching: the per-turn prompt splits into a stable, cacheable prefix and a variable remainder. With caching the prefix is paid once at write price and thereafter at the cheap read price — so a representation with a large stable prefix is far cheaper per session.
Comparison
The result of comparing two programs (e.g. two encodings of the same task).
Program
A program representation to evaluate for token efficiency.
ScalingReport
How a program’s output token cost grows with result size — the curve that matters at agent scale, not a single-size sample. Fit from samples at several sizes: a marginal per_item cost (slope) and a fixed_overhead (intercept).

Enums§

Model
A popular agentic AI system, identified by its tokenizer family.

Constants§

CACHE_READ_MULT
Default prompt-cache pricing multiplier for a cache read: 0.1× the base price.
CACHE_WRITE_MULT
Default prompt-cache pricing multiplier for writing the cache (Anthropic-style): 1.25× the base token price.

Functions§

assess_cache
Model prompt-cache savings for a prefix/variable token split over turns, using the default CACHE_WRITE_MULT/CACHE_READ_MULT multipliers.
assess_scaling
Measure output-token scaling: render the program’s output at each of sizes items, count tokens with count, and fit a line. produce(n) returns the representative output for n result items.
cacheable_prefix_tokens
Tokens in the longest common character prefix shared by every prompt in prompts — an approximation of the cache-eligible region across turns, counted with count. Empty input or no shared prefix → 0.
compare
Compare two programs under model, amortized over turns.
evaluate
Evaluate one program’s cost terms under model.
evaluate_all
Evaluate a program across every supported model.
evaluate_with
Evaluate a program with a custom token counter — any Fn(&str) -> usize, such as a host application’s exact tokenizer or a model not in Model. This lets the crate’s cost model work with any tokenizer, not just the built-in set.
heuristic_tokens
A labeled, deterministic token heuristic that tracks real BPE counts within ~10–20% for code-like text. Used when a real BPE isn’t available (no real-tokens feature, or Claude). Rules: each run of letters/digits is one token; an underscore separates snake_case subwords (each its own ~token, as real tokenizers usually split, e.g. file_read ≈ 2) but is not itself counted; every other non-whitespace punctuation/symbol char counts ~1.
rank
Rank N programs by their amortized session cost under model (cheapest first). Returns (index_into_programs, total_tokens) pairs sorted ascending by total — the N-way generalization of compare. Ties keep input order (stable sort).
rank_with
Like rank, but with a custom token counter (see evaluate_with). Returns (index, total_tokens) pairs sorted cheapest-first; ties keep input order.