Skip to main content

Module pricing

Module pricing 

Source
Expand description

Cost kernel — the API-equivalent pricing math, ported 1:1 from usage_signal.py (the FALLBACK_PRICING table, model matcher, _tiered, turn_cost, turn_cache_savings). This is the first slice of folding the Python aggregator into Rust (ROADMAP E1). It is PURE — no I/O, no clock — so it is pinned by a golden fixture generated from the Python (tests/), guaranteeing byte-for-byte cost parity (see docs/ai/COST_MODEL.md).

Rates are USD per token. The bundled table mirrors the LiteLLM dataset ccusage uses; the live LiteLLM fetch + 24h cache (the I/O half of load_pricing) is a later slice — until then this is the offline table, which is exactly what CONTEXTBAR_PRICING_OFFLINE=1 selects in the Python.

Structs§

Rate
One model’s per-token rates. None = the category isn’t billed / has no tier (e.g. OpenAI models have no cache-write cw; flat models have no *_200k). Mirrors the Python short-rate dict keys exactly.

Constants§

TIER_THRESHOLD
Anthropic’s long-context tier threshold: tokens strictly above this in a category bill at the *_200k rate (when the model carries one).

Statics§

FALLBACK_PRICING
Bundled offline rate table (USD/token), captured from LiteLLM — verbatim from FALLBACK_PRICING in usage_signal.py. Order is preserved so the longest-prefix match in match_pricing is deterministic.

Functions§

fallback_table
The bundled offline table as a Table — the deterministic baseline that CONTEXTBAR_PRICING_OFFLINE=1 selects in the Python, and what the golden tests pin against.
load_pricing
Live + cached pricing resolution (native only — needs HTTP + filesystem). Mirrors the Python load_pricing: fresh 24h cache → live LiteLLM fetch (then cache) → stale cache → bundled fallback. CONTEXTBAR_PRICING_OFFLINE forces the offline (fallback) path.
match_pricing
Resolve a transcript model id onto a rate using table. None when unpriceable (cost 0 — an honest undercount, never a crash). Mirrors match_pricing(model, table).
normalize_model
Normalize a transcript model id: lowercase, strip provider prefixes, drop the 1M-context tag (pricing is identical to the base model).
tiered
Anthropic >200K tiering for one token category (ccusage-compatible).
turn_cache_savings
NET USD that prompt caching saved this turn (can be slightly negative on a write-heavy turn). Mirrors turn_cache_savings.
turn_cost
Estimated USD for one turn given its rate + token buckets. Arg order mirrors the Python turn_cost(rate, inp, cache_create, cache_read, outp).

Type Aliases§

Table
A resolved rate table (the bundled fallback merged with any live/cached LiteLLM rates). Keyed by normalized model id.