Expand description
Token efficiency: count tokens under popular agentic tokenizers and model the four cost terms an agent pays per task.
An agent’s total cost is not just the characters it types. It is:
standing_context (the schema/cheatsheet it must carry to use the program,
re-sent each turn) + input (the program it writes) + output (what it reads
back) + retries (re-dos from ambiguity/failure). A representation that golfs
input while inflating standing_context can be a net loss — so this module
counts all four and amortizes over a session.
Structs§
- Agent
Cost - The four token-cost terms an agent pays per task. All in tokens.
- Cache
Report - How much a representation benefits from API prompt-caching: the per-turn prompt splits into a stable, cacheable prefix and a variable remainder. With caching the prefix is paid once at write price and thereafter at the cheap read price — so a representation with a large stable prefix is far cheaper per session.
- Comparison
- The result of comparing two programs (e.g. two encodings of the same task).
- Program
- A program representation to evaluate for token efficiency.
- Scaling
Report - How a program’s output token cost grows with result size — the curve that
matters at agent scale, not a single-size sample. Fit from samples at several
sizes: a marginal
per_itemcost (slope) and afixed_overhead(intercept).
Enums§
- Model
- A popular agentic AI system, identified by its tokenizer family.
Constants§
- CACHE_
READ_ MULT - Default prompt-cache pricing multiplier for a cache read: 0.1× the base price.
- CACHE_
WRITE_ MULT - Default prompt-cache pricing multiplier for writing the cache (Anthropic-style): 1.25× the base token price.
Functions§
- assess_
cache - Model prompt-cache savings for a
prefix/variabletoken split overturns, using the defaultCACHE_WRITE_MULT/CACHE_READ_MULTmultipliers. - assess_
scaling - Measure output-token scaling: render the program’s output at each of
sizesitems, count tokens withcount, and fit a line.produce(n)returns the representative output fornresult items. - cacheable_
prefix_ tokens - Tokens in the longest common character prefix shared by every prompt in
prompts— an approximation of the cache-eligible region across turns, counted withcount. Empty input or no shared prefix → 0. - compare
- Compare two programs under
model, amortized overturns. - evaluate
- Evaluate one program’s cost terms under
model. - evaluate_
all - Evaluate a program across every supported model.
- evaluate_
with - Evaluate a program with a custom token counter — any
Fn(&str) -> usize, such as a host application’s exact tokenizer or a model not inModel. This lets the crate’s cost model work with any tokenizer, not just the built-in set. - heuristic_
tokens - A labeled, deterministic token heuristic that tracks real BPE counts within
~10–20% for code-like text. Used when a real BPE isn’t available (no
real-tokensfeature, or Claude). Rules: each run of letters/digits is one token; an underscore separatessnake_casesubwords (each its own ~token, as real tokenizers usually split, e.g.file_read≈ 2) but is not itself counted; every other non-whitespace punctuation/symbol char counts ~1. - rank
- Rank N programs by their amortized session cost under
model(cheapest first). Returns(index_into_programs, total_tokens)pairs sorted ascending by total — the N-way generalization ofcompare. Ties keep input order (stable sort). - rank_
with - Like
rank, but with a custom token counter (seeevaluate_with). Returns(index, total_tokens)pairs sorted cheapest-first; ties keep input order.