llm-budget-window 0.1.0

Time-windowed token + USD budget. Define multiple rolling windows (e.g. $5/minute, $100/day) and reject when any window's cap would be breached. Thread-safe, zero deps.
Documentation
  • Coverage
  • 100%
    16 out of 16 items documented1 out of 1 items with examples
  • Size
  • Source code size: 32.94 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 491.34 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 13s Average build duration of successful builds.
  • all releases: 13s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • MukundaKatta/llm-budget-window
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • MukundaKatta

llm-budget-window

Crates.io Documentation CI License

Time-windowed token + USD budget for LLM calls.

token-budget-pool caps total spend across concurrent tasks. This crate adds a time axis: cap spend per minute, per hour, per day, or any combination. Each recorded call is timestamped; old entries fall out of the window automatically.

Install

[dependencies]
llm-budget-window = "0.1"

Use

use std::time::Duration;
use llm_budget_window::{BudgetWindows, Window};

let bw = BudgetWindows::new(vec![
    Window::new("per_minute", Duration::from_secs(60))
        .with_token_cap(50_000)
        .with_usd_cap(1.0),
    Window::new("per_hour", Duration::from_secs(3600))
        .with_usd_cap(10.0),
    Window::new("per_day", Duration::from_secs(86_400))
        .with_usd_cap(100.0),
]);

match bw.record(tokens, usd) {
    Ok(()) => {
        // call the LLM
    }
    Err(breach) => {
        // some window's cap would be exceeded; back off
        eprintln!("budget breached on {} axis {}", breach.window_name, breach.axis);
    }
}

Both axes are optional per window. Leave one unset for unbounded:

Window::new("min", Duration::from_secs(60)).with_token_cap(50_000)   // tokens only
Window::new("hour", Duration::from_secs(3600)).with_usd_cap(10.0)   // usd only
Window::new("any", Duration::from_secs(60))                          // counter only

Atomic semantics: a call to record(t, u) either commits to ALL windows or commits to none. If any window would breach, no window is updated.

Memory

Each window keeps a VecDeque<(timestamp, tokens, usd)> of records that haven't aged out yet. A 1-day window with 10 calls/sec carries ~864k entries; a 1-minute window with the same rate carries 600. Set the windows you actually need.

What it does NOT do

  • No persistence. Counts live in process. For multi-process budgets, use a Redis ZSET with timestamp scores.
  • No automatic backoff. On breach, your caller decides what to do (wait, fall back to cheaper model, skip).
  • No async runtime lock-in. Internal lock is std::sync::Mutex held microseconds only.

License

MIT OR Apache-2.0

Composes with token-budget-pool for total-spend caps, claude-cost / openai-cost / gemini-cost / bedrock-cost for the USD calculation, and llm-retry + llm-circuit-breaker for the resilience layer.