llm-budget-window
Time-windowed token + USD budget for LLM calls.
token-budget-pool caps
total spend across concurrent tasks. This crate adds a time axis: cap
spend per minute, per hour, per day, or any combination. Each recorded
call is timestamped; old entries fall out of the window automatically.
Install
[]
= "0.1"
Use
use Duration;
use ;
let bw = new;
match bw.record
Both axes are optional per window. Leave one unset for unbounded:
new.with_token_cap // tokens only
new.with_usd_cap // usd only
new // counter only
Atomic semantics: a call to record(t, u) either commits to ALL windows
or commits to none. If any window would breach, no window is updated.
Memory
Each window keeps a VecDeque<(timestamp, tokens, usd)> of records that
haven't aged out yet. A 1-day window with 10 calls/sec carries ~864k
entries; a 1-minute window with the same rate carries 600. Set the
windows you actually need.
What it does NOT do
- No persistence. Counts live in process. For multi-process budgets, use a Redis ZSET with timestamp scores.
- No automatic backoff. On breach, your caller decides what to do (wait, fall back to cheaper model, skip).
- No async runtime lock-in. Internal lock is
std::sync::Mutexheld microseconds only.
License
MIT OR Apache-2.0
Composes with
token-budget-pool for
total-spend caps,
claude-cost /
openai-cost /
gemini-cost /
bedrock-cost for the USD
calculation, and
llm-retry +
llm-circuit-breaker
for the resilience layer.