Available now (v0.6):
- Token-bucket throttling — smooth refill with burst headroom; lock-free accounting (one atomic compare-and-swap per acquire)
- Exact sliding-window-log — when you need no boundary burst at all, an exact alternative that composes everywhere the bucket does
- Wait, don't reject — the outbound default is
acquire().await, which paces the caller;try_acquire()is there when you need the non-blocking answer - Cost-aware acquisition —
acquire_with_cost(n)— not every request weighs one unit - Multi-dimensional limits — enforce req/min AND input-tokens/min AND output-tokens/min at once; the killer feature for LLM APIs
- Composition — hybrid (must pass all), per-key (independent state per tenant), and layered (global / per-key / per-endpoint) limiters, combined without the call site changing
- Bounded memory — per-key state is sharded and evicted (idle TTL + hard cap), so a flood of unique keys hits a ceiling instead of growing without limit
- Retry + backoff — constant / linear / exponential backoff with full, equal, or decorrelated jitter; a retry policy with per-error classification;
Retry-Afterparsed and honored - Circuit breaker — closed / open / half-open recovery; wraps any limiter and fails fast when open, without consuming it
- Queueing — a bounded, deadline-aware, priority queue with fair-across-keys scheduling and reject / drop-oldest / drop-lowest-priority overflow
- Adaptive concurrency — AIMD and Vegas-style controllers that discover the right in-flight limit from outcome feedback, slowing down when a downstream struggles with no explicit signal, bounded by a floor and a hard ceiling
- Provider-aware — parse
x-ratelimit-*/retry-afterheaders from OpenAI, Anthropic, GitHub, Stripe, AWS, or the RFC draft; reconcile your limiter with the server's view; start from LLM tier presets
On the roadmap:
- Observability (v0.7) — metrics and tracing, zero-cost when disabled
- Runtime-agnostic (v0.8) — tokio today, with async-std and smol planned
Installation
[]
= "0.6"
# Optional features:
= { = "0.6", = ["circuit-breaker", "adaptive", "provider-llm"] }
Quick start
Pace your outbound calls so you never overwhelm a downstream:
use Throttle;
async
Budget an LLM provider across several limits at once — requests, input tokens, and output tokens:
use Duration;
use ;
async
Throttle independently per tenant, with bounded memory:
use PerKey;
async
Stack scopes — an overall ceiling, a per-tenant share, and a per-endpoint cap:
use ;
async
Retry a flaky call with jittered backoff, honoring a server Retry-After:
use Duration;
use ;
async
Wrap a flaky downstream in a circuit breaker (needs the circuit-breaker feature):
use Duration;
use ;
async
Stay in sync with a provider's own rate-limit headers, and start from a tier preset (needs the provider-llm feature):
use presets;
use HeaderProfile;
async
Full runnable examples live in examples/:
Performance
Local criterion means (cargo bench --bench throttle_bench, Windows x86_64, Rust stable):
- Single-throttle
try_acquire(uncontended): ~27 ns — one atomic compare-and-swap - Per-key lookup, 10 000 live keys: ~70 ns — hash, shard read lock, map get, acquire
Where It Fits
throttle-net is the outbound resilience layer. It is used by:
rate-net— the inbound counterpart; throttle-net is outboundpack-io/network-protocol— clients that call rate-limited downstreams- AVA / agent-provider — LLM API budgeting with multi-dimensional token limits
- Hive DB — cluster RPC backpressure and downstream protection
It stays foreign-compatible: the obvious default for "I need to call an external API in Rust and not get banned."
Contributing
Before opening a PR, cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean. Hot-path changes require a criterion benchmark; correctness-critical paths require property and/or loom tests.