# llm-budget-window
[](https://crates.io/crates/llm-budget-window)
[](https://docs.rs/llm-budget-window)
[](https://github.com/MukundaKatta/llm-budget-window/actions/workflows/ci.yml)
[](https://crates.io/crates/llm-budget-window)
**Time-windowed token + USD budget for LLM calls.**
[`token-budget-pool`](https://crates.io/crates/token-budget-pool) caps
total spend across concurrent tasks. This crate adds a time axis: cap
spend per minute, per hour, per day, or any combination. Each recorded
call is timestamped; old entries fall out of the window automatically.
## Install
```toml
[dependencies]
llm-budget-window = "0.1"
```
## Use
```rust
use std::time::Duration;
use llm_budget_window::{BudgetWindows, Window};
let bw = BudgetWindows::new(vec![
Window::new("per_minute", Duration::from_secs(60))
.with_token_cap(50_000)
.with_usd_cap(1.0),
Window::new("per_hour", Duration::from_secs(3600))
.with_usd_cap(10.0),
Window::new("per_day", Duration::from_secs(86_400))
.with_usd_cap(100.0),
]);
match bw.record(tokens, usd) {
Ok(()) => {
// call the LLM
}
Err(breach) => {
// some window's cap would be exceeded; back off
eprintln!("budget breached on {} axis {}", breach.window_name, breach.axis);
}
}
```
Both axes are optional per window. Leave one unset for unbounded:
```rust
Window::new("min", Duration::from_secs(60)).with_token_cap(50_000) // tokens only
Window::new("hour", Duration::from_secs(3600)).with_usd_cap(10.0) // usd only
Window::new("any", Duration::from_secs(60)) // counter only
```
Atomic semantics: a call to `record(t, u)` either commits to ALL windows
or commits to none. If any window would breach, no window is updated.
## Memory
Each window keeps a `VecDeque<(timestamp, tokens, usd)>` of records that
haven't aged out yet. A 1-day window with 10 calls/sec carries ~864k
entries; a 1-minute window with the same rate carries 600. Set the
windows you actually need.
## What it does NOT do
- No persistence. Counts live in process. For multi-process budgets, use
a Redis ZSET with timestamp scores.
- No automatic backoff. On breach, your caller decides what to do
(wait, fall back to cheaper model, skip).
- No async runtime lock-in. Internal lock is `std::sync::Mutex` held
microseconds only.
## License
MIT OR Apache-2.0
Composes with
[`token-budget-pool`](https://crates.io/crates/token-budget-pool) for
total-spend caps,
[`claude-cost`](https://crates.io/crates/claude-cost) /
[`openai-cost`](https://crates.io/crates/openai-cost) /
[`gemini-cost`](https://crates.io/crates/gemini-cost) /
[`bedrock-cost`](https://crates.io/crates/bedrock-cost) for the USD
calculation, and
[`llm-retry`](https://crates.io/crates/llm-retry) +
[`llm-circuit-breaker`](https://crates.io/crates/llm-circuit-breaker)
for the resilience layer.