llm-budget-window 0.1.0

# llm-budget-window

[![Crates.io](https://img.shields.io/crates/v/llm-budget-window.svg)](https://crates.io/crates/llm-budget-window)
[![Documentation](https://docs.rs/llm-budget-window/badge.svg)](https://docs.rs/llm-budget-window)
[![CI](https://github.com/MukundaKatta/llm-budget-window/actions/workflows/ci.yml/badge.svg)](https://github.com/MukundaKatta/llm-budget-window/actions/workflows/ci.yml)
[![License](https://img.shields.io/crates/l/llm-budget-window.svg)](https://crates.io/crates/llm-budget-window)

**Time-windowed token + USD budget for LLM calls.**

[`token-budget-pool`](https://crates.io/crates/token-budget-pool) caps
total spend across concurrent tasks. This crate adds a time axis: cap
spend per minute, per hour, per day, or any combination. Each recorded
call is timestamped; old entries fall out of the window automatically.

## Install

```toml
[dependencies]
llm-budget-window = "0.1"
```

## Use

```rust
use std::time::Duration;
use llm_budget_window::{BudgetWindows, Window};

let bw = BudgetWindows::new(vec![
    Window::new("per_minute", Duration::from_secs(60))
        .with_token_cap(50_000)
        .with_usd_cap(1.0),
    Window::new("per_hour", Duration::from_secs(3600))
        .with_usd_cap(10.0),
    Window::new("per_day", Duration::from_secs(86_400))
        .with_usd_cap(100.0),
]);

match bw.record(tokens, usd) {
    Ok(()) => {
        // call the LLM
    }
    Err(breach) => {
        // some window's cap would be exceeded; back off
        eprintln!("budget breached on {} axis {}", breach.window_name, breach.axis);
    }
}
```

Both axes are optional per window. Leave one unset for unbounded:

```rust
Window::new("min", Duration::from_secs(60)).with_token_cap(50_000)   // tokens only
Window::new("hour", Duration::from_secs(3600)).with_usd_cap(10.0)   // usd only
Window::new("any", Duration::from_secs(60))                          // counter only
```

Atomic semantics: a call to `record(t, u)` either commits to ALL windows
or commits to none. If any window would breach, no window is updated.

## Memory

Each window keeps a `VecDeque<(timestamp, tokens, usd)>` of records that
haven't aged out yet. A 1-day window with 10 calls/sec carries ~864k
entries; a 1-minute window with the same rate carries 600. Set the
windows you actually need.

## What it does NOT do

- No persistence. Counts live in process. For multi-process budgets, use
  a Redis ZSET with timestamp scores.
- No automatic backoff. On breach, your caller decides what to do
  (wait, fall back to cheaper model, skip).
- No async runtime lock-in. Internal lock is `std::sync::Mutex` held
  microseconds only.

## License

MIT OR Apache-2.0

Composes with
[`token-budget-pool`](https://crates.io/crates/token-budget-pool) for
total-spend caps,
[`claude-cost`](https://crates.io/crates/claude-cost) /
[`openai-cost`](https://crates.io/crates/openai-cost) /
[`gemini-cost`](https://crates.io/crates/gemini-cost) /
[`bedrock-cost`](https://crates.io/crates/bedrock-cost) for the USD
calculation, and
[`llm-retry`](https://crates.io/crates/llm-retry) +
[`llm-circuit-breaker`](https://crates.io/crates/llm-circuit-breaker)
for the resilience layer.