grate-limiter
Predict limits before providers enforce them.
grate-limiter is an anticipatory rate-limit orchestration engine for multi-provider systems. It routes traffic intelligently across providers by continuously learning their health, quota usage, and reliability — preventing rate-limit errors before they happen.
What it is
- A predictive provider orchestration engine
- An adaptive quota router with anticipatory exhaustion detection
- A provider health tracker with EWMA-based scoring
What it is NOT
- Not a retry library
- Not a proxy or gateway
- Not a simple rate limiter
Features
- Anticipatory routing — predicts quota exhaustion before it happens using burn rate analysis
- Multiple quota strategies — token bucket, sliding window, fixed window, concurrency limiter
- Health engine — EWMA-based scoring with exponential decay and automatic cooldowns
- Weighted composite scoring — configurable weights for quota, health, priority, and latency
- Deterministic testing —
MockClockfor fully reproducible test scenarios - Zero-allocation hot paths — designed for
<10µsselect latency - Thread-safe — lock-free atomic operations on the hot path
- Explainable decisions — full score breakdown with every selection
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
use *;
// Create the engine
let engine = new;
// Register providers with their quotas
engine.upsert_provider;
engine.upsert_provider;
// Register a capability
engine.upsert_capability;
// Select the best provider
let decision = engine.select.unwrap;
println!;
// Report what happened
engine.observe.unwrap;
=
=
import { GrateLimiter, Dimension, Window, StatusClass } from '@dev-kasibhatla/grate-limiter';
const engine = new GrateLimiter();
engine.upsertProvider({
name: 'openai',
quotas: [{ dimension: Dimension.Requests, limit: 5000, window: Window.Minute }],
priority: 10, cooldownSeconds: 30,
});
engine.upsertProvider({
name: 'anthropic',
quotas: [{ dimension: Dimension.Requests, limit: 3000, window: Window.Minute }],
priority: 8, cooldownSeconds: 30,
});
engine.upsertCapability({
name: 'chat-completion',
providers: [
{ provider: 'openai', priority: 10 },
{ provider: 'anthropic', priority: 8 },
],
});
const decision = engine.select('chat-completion');
console.log(`Use: ${decision.provider} (score: ${decision.score.toFixed(2)})`);
engine.observe({
provider: 'openai', capability: 'chat-completion',
usage: { requests: 1, tokens: 1200 },
outcome: { status: StatusClass.Success, latencyMs: 830 },
});
How It Works
Scoring Algorithm
Each provider is scored using a weighted composite of four factors:
| Component | Weight | Description |
|---|---|---|
| Quota | 40% | Remaining capacity + anticipatory exhaustion penalty |
| Health | 35% | EWMA-based health score from observed outcomes |
| Priority | 20% | Capability-level provider preference |
| Latency | 5% | Normalized response time |
Anticipatory Routing
Instead of the traditional approach:
send → fail with 429 → retry elsewhere
grate-limiter does:
predict nearing exhaustion → avoid provider before failure
The engine tracks burn rate and predicts time-to-exhaustion. Providers with imminent exhaustion are aggressively deprioritized even if they still have remaining capacity.
Health Engine
Provider health is tracked via EWMA (Exponentially Weighted Moving Average):
- Successes boost health slightly
- 429s apply a major penalty (-0.25)
- 403s apply a severe penalty (-0.50)
- 5xx/timeouts apply moderate penalties
- Health decays toward 1.0 over time (recovery)
- Consecutive failures trigger exponential cooldowns
HTTP Server
For use from non-Rust services, the grate-limiter-server crate provides a REST API:
| Endpoint | Method | Description |
|---|---|---|
/providers |
POST | Register/update a provider |
/capabilities |
POST | Register/update a capability |
/select |
POST | Select the best provider |
/observe |
POST | Report an observation |
/metrics |
GET | Engine metrics |
/health |
GET | Health check |
Packages
| Package | Registry | Description |
|---|---|---|
grate-limiter |
crates.io | Core Rust library |
grate-limiter-server |
crates.io | HTTP server |
grate-limiter-simulation |
crates.io | Simulation framework |
grate-limiter |
PyPI | Python port |
@dev-kasibhatla/grate-limiter |
npm | TypeScript port |
Examples
Real-World Use Cases
- AI APIs — OpenAI / Anthropic / Gemini load balancing
- Web scraping — Multi-proxy vendor rotation
- SMS delivery — Twilio / MessageBird / Nexmo failover
- CAPTCHA solving — Multi-provider orchestration
- Search APIs — SerpAPI / BrightData balancing
Deterministic Testing
Use MockClock for reproducible tests:
use ;
use Arc;
let clock = new;
let config = default.with_clock;
let engine = new;
// Time only advances when you say so
clock.advance_ms;
clock.advance_secs;
Benchmarks
Run benchmarks locally:
Performance targets:
| Metric | Target |
|---|---|
select() p99 |
<50µs |
observe() p99 |
<20µs |
| Memory per provider | <4KB |
| Routing determinism | 100% |
Roadmap
- Core anticipatory engine
- Token bucket / sliding window / fixed window / concurrency quotas
- EWMA health scoring with cooldowns
- HTTP server
- Simulation framework
- Property-based testing
- Python port (native) — PyPI
- JavaScript/TypeScript port (native) — npm
- Cross-language conformance tests
- Distributed state (Redis backend)
- Persistent snapshots
- WASM builds
- Adaptive hidden-quota learning
Contributing
See CONTRIBUTING.md for guidelines.
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.