throttle-net 0.6.0

Available now (v0.6):

Token-bucket throttling — smooth refill with burst headroom; lock-free accounting (one atomic compare-and-swap per acquire)
Exact sliding-window-log — when you need no boundary burst at all, an exact alternative that composes everywhere the bucket does
Wait, don't reject — the outbound default is acquire().await, which paces the caller; try_acquire() is there when you need the non-blocking answer
Cost-aware acquisition — acquire_with_cost(n) — not every request weighs one unit
Multi-dimensional limits — enforce req/min AND input-tokens/min AND output-tokens/min at once; the killer feature for LLM APIs
Composition — hybrid (must pass all), per-key (independent state per tenant), and layered (global / per-key / per-endpoint) limiters, combined without the call site changing
Bounded memory — per-key state is sharded and evicted (idle TTL + hard cap), so a flood of unique keys hits a ceiling instead of growing without limit
Retry + backoff — constant / linear / exponential backoff with full, equal, or decorrelated jitter; a retry policy with per-error classification; Retry-After parsed and honored
Circuit breaker — closed / open / half-open recovery; wraps any limiter and fails fast when open, without consuming it
Queueing — a bounded, deadline-aware, priority queue with fair-across-keys scheduling and reject / drop-oldest / drop-lowest-priority overflow
Adaptive concurrency — AIMD and Vegas-style controllers that discover the right in-flight limit from outcome feedback, slowing down when a downstream struggles with no explicit signal, bounded by a floor and a hard ceiling
Provider-aware — parse x-ratelimit-* / retry-after headers from OpenAI, Anthropic, GitHub, Stripe, AWS, or the RFC draft; reconcile your limiter with the server's view; start from LLM tier presets

On the roadmap:

Observability (v0.7) — metrics and tracing, zero-cost when disabled
Runtime-agnostic (v0.8) — tokio today, with async-std and smol planned

Installation

[dependencies]
throttle-net = "0.6"

# Optional features:
throttle-net = { version = "0.6", features = ["circuit-breaker", "adaptive", "provider-llm"] }

Quick start

Pace your outbound calls so you never overwhelm a downstream:

use throttle_net::Throttle;

#[tokio::main]
async fn main() -> Result<(), throttle_net::ThrottleError> {
    // 100 requests per second, bursting up to 100.
    let throttle = Throttle::per_second(100);

    throttle.acquire().await?; // returns as soon as a token is free
    // ... call the downstream ...
    Ok(())
}

Budget an LLM provider across several limits at once — requests, input tokens, and output tokens:

use std::time::Duration;
use throttle_net::{MultiLimiter, Throttle};

#[tokio::main]
async fn main() -> Result<(), throttle_net::ThrottleError> {
    let minute = Duration::from_secs(60);
    let limiter = MultiLimiter::builder()
        .dimension("requests", Throttle::per_duration(60, minute))
        .dimension("input_tokens", Throttle::per_duration(100_000, minute))
        .dimension("output_tokens", Throttle::per_duration(20_000, minute))
        .build();

    // Admitted only when every budget can afford this call.
    limiter
        .acquire_costs(&[("requests", 1), ("input_tokens", 1500), ("output_tokens", 200)])
        .await?;
    Ok(())
}

Throttle independently per tenant, with bounded memory:

use throttle_net::PerKey;

#[tokio::main]
async fn main() -> Result<(), throttle_net::ThrottleError> {
    // 100 requests per second, per tenant.
    let limiter: PerKey<String> = PerKey::per_second(100);
    limiter.acquire(&"tenant:42".to_string()).await?;
    Ok(())
}

Stack scopes — an overall ceiling, a per-tenant share, and a per-endpoint cap:

use throttle_net::{Layered, PerKey, Throttle};

#[tokio::main]
async fn main() -> Result<(), throttle_net::ThrottleError> {
    let layered = Layered::<String>::builder()
        .global(Throttle::per_second(1000))
        .per_key(PerKey::per_second(100))
        .per_endpoint(PerKey::per_second(50))
        .build();

    layered
        .acquire(&"tenant:42".to_string(), &"/v1/chat".to_string())
        .await?;
    Ok(())
}

Retry a flaky call with jittered backoff, honoring a server Retry-After:

use std::time::Duration;
use throttle_net::{Backoff, Retry, RetryAction, parse_retry_after};

struct Rejected { retry_after: Option<String> }

#[tokio::main]
async fn main() {
    // Exponential from 100ms, doubling, capped at 5s, decorrelated jitter (the default).
    let retry = Retry::new(Backoff::default().with_max(Duration::from_secs(5))).max_attempts(5);

    let result: Result<&str, Rejected> = retry
        .run(
            || async { Err(Rejected { retry_after: None }) }, // your fallible call
            |err: &Rejected| match err.retry_after.as_deref().and_then(parse_retry_after) {
                Some(after) => RetryAction::RetryAfter(after), // honor the server's hint
                None => RetryAction::Retry,                    // else use the backoff
            },
        )
        .await;
    let _ = result;
}

Wrap a flaky downstream in a circuit breaker (needs the circuit-breaker feature):

use std::time::Duration;
use throttle_net::{CircuitBreaker, Throttle, Trip};

#[tokio::main]
async fn main() {
    let breaker = CircuitBreaker::builder()
        .trip(Trip::Consecutive(5))           // open after 5 failures in a row
        .cooldown(Duration::from_secs(10))
        .build(Throttle::per_second(100));

    match breaker.acquire().await {
        Ok(permit) => {
            // ... call the downstream ...
            let ok = true;
            if ok { permit.success() } else { permit.failure() }
        }
        Err(_shed) => { /* breaker open: fail fast */ }
    }
}

Stay in sync with a provider's own rate-limit headers, and start from a tier preset (needs the provider-llm feature):

use throttle_net::presets;
use throttle_net::provider::HeaderProfile;

#[tokio::main]
async fn main() -> Result<(), throttle_net::ThrottleError> {
    let limiter = presets::anthropic::tier_2(); // requests + input/output token budgets

    // ... after a response, reconcile with what the server reported ...
    let headers = [
        ("anthropic-ratelimit-requests-remaining", "12"),
        ("anthropic-ratelimit-tokens-remaining", "40000"),
    ];
    let info = HeaderProfile::ANTHROPIC.parse(&headers);
    let _ = info; // info.sync_requests(&throttle) drains a Throttle to the server's count

    limiter.acquire_costs(&[("requests", 1), ("input_tokens", 1500)]).await?;
    Ok(())
}

Full runnable examples live in examples/:

cargo run --example llm_budget                                       # multi-dimensional LLM budgets
cargo run --example retry_backoff                                    # retry with backoff + Retry-After
cargo run --example circuit_breaker      --features circuit-breaker  # trip, shed, recover
cargo run --example adaptive_concurrency --features adaptive         # learn the limit from feedback

Performance

Local criterion means (cargo bench --bench throttle_bench, Windows x86_64, Rust stable):

Single-throttle try_acquire (uncontended): ~27 ns — one atomic compare-and-swap
Per-key lookup, 10 000 live keys: ~70 ns — hash, shard read lock, map get, acquire

Where It Fits

throttle-net is the outbound resilience layer. It is used by:

rate-net — the inbound counterpart; throttle-net is outbound
pack-io / network-protocol — clients that call rate-limited downstreams
AVA / agent-provider — LLM API budgeting with multi-dimensional token limits
Hive DB — cluster RPC backpressure and downstream protection

It stays foreign-compatible: the obvious default for "I need to call an external API in Rust and not get banned."

Contributing

Before opening a PR, cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean. Hot-path changes require a criterion benchmark; correctness-critical paths require property and/or loom tests.