llm-budget-window

Time-windowed token + USD budget for LLM calls.

token-budget-pool caps total spend across concurrent tasks. This crate adds a time axis: cap spend per minute, per hour, per day, or any combination. Each recorded call is timestamped; old entries fall out of the window automatically.

Install

[dependencies]
llm-budget-window = "0.1"

Use

use std::time::Duration;
use llm_budget_window::{BudgetWindows, Window};

let bw = BudgetWindows::new(vec![
    Window::new("per_minute", Duration::from_secs(60))
        .with_token_cap(50_000)
        .with_usd_cap(1.0),
    Window::new("per_hour", Duration::from_secs(3600))
        .with_usd_cap(10.0),
    Window::new("per_day", Duration::from_secs(86_400))
        .with_usd_cap(100.0),
]);

match bw.record(tokens, usd) {
    Ok(()) => {
        // call the LLM
    }
    Err(breach) => {
        // some window's cap would be exceeded; back off
        eprintln!("budget breached on {} axis {}", breach.window_name, breach.axis);
    }
}

Both axes are optional per window. Leave one unset for unbounded:

Window::new("min", Duration::from_secs(60)).with_token_cap(50_000)   // tokens only
Window::new("hour", Duration::from_secs(3600)).with_usd_cap(10.0)   // usd only
Window::new("any", Duration::from_secs(60))                          // counter only

Atomic semantics: a call to record(t, u) either commits to ALL windows or commits to none. If any window would breach, no window is updated.

Memory

Each window keeps a VecDeque<(timestamp, tokens, usd)> of records that haven't aged out yet. A 1-day window with 10 calls/sec carries ~864k entries; a 1-minute window with the same rate carries 600. Set the windows you actually need.

What it does NOT do

No persistence. Counts live in process. For multi-process budgets, use a Redis ZSET with timestamp scores.
No automatic backoff. On breach, your caller decides what to do (wait, fall back to cheaper model, skip).
No async runtime lock-in. Internal lock is std::sync::Mutex held microseconds only.

License

MIT OR Apache-2.0

Composes with token-budget-pool for total-spend caps, claude-cost / openai-cost / gemini-cost / bedrock-cost for the USD calculation, and llm-retry + llm-circuit-breaker for the resilience layer.

llm-budget-window 0.1.0

llm-budget-window

Install

Use

Memory

What it does NOT do

License