Quota
A high-performance in-memory rate limiter for Rust, using a mix of Leaky Token Bucket & GCRA.
Quick Comparison
Benchmark numbers are Criterion-reported ns/op from the local harness (You will find in
./benches/limiters.rs) which uses 256 Tokio tasks on 16 Tokio worker threads, pre-warmed allow-path state. They are wall-clock elapsed time divided by total operations across concurrent tasks, not single-core instruction latency. To avoid confusion with the numbers, I added the more normal op / s throughput.
| Axis | quota |
governor |
flux_limiter |
tokio_rate_limit |
ratelimit |
leaky_bucket |
|---|---|---|---|---|---|---|
| Same-Key Throughput | 46.94 ns (21 Mop/s) | 107.49 ns (9 Mop/s) | 201.96 ns (5 Mop/s) | 112.23 ns (9 Mop/s) | Not keyed | Not keyed |
| Distributed-Key Throughput | 1.95 ns (513 Mop/s) | 7.73 ns (129 Mop/s) | 11.11 ns (90 Mop/s) | 9.04 ns (110 Mop/s) | Not keyed | Not keyed |
| Single Limiter Throughput | 47.34 ns (21 Mop/s) | 74.82 ns (13 Mop/s) | Uses keyed path: 201.96 ns hot | Uses keyed path: 112.23 ns hot | 82.65 ns | 159.45 ns |
| Refill Interval | Yes: set_refill_interval |
No: rate period is GCRA cell spacing, not batch refill ticks | No: only rate_nanos + burst tolerance |
No: rate/sec + burst; elapsed refill | No: scaled continuous refill | Yes: refill(...) + interval(...) |
| Algorithm | GCRA pool; legacy direct token counter still present | GCRA | GCRA | Token bucket default; custom algorithms | Token bucket | Token/leaky bucket |
| Keyed Limiting | Yes | Yes | Yes | Yes | No | No |
| Direct / Global Limiting | Yes, via QuotaPool single key; legacy Quota counter |
Yes | No direct type; use constant key | No direct type; use constant key | Yes | Yes |
| Weighted Costs | Yes | Yes: check_n |
No: one request per call | Yes: check_with_cost |
Yes: try_wait_n |
Yes: acquire(n) / try_acquire(n) |
| Async Wait / Backpressure API | No | Yes: until_ready style APIs |
No | Yes: acquire and acquire_timeout |
No; returns retry duration | Yes: acquire futures |
| Nonblocking Try API | Yes: check / consume |
Yes: check / check_n |
Yes: check_request |
Yes: check, try_acquire, try_acquire_n |
Yes: try_wait_n |
Yes: try_acquire |
| Built-In Web Middleware | No | No framework middleware; has stream/sink helpers and middleware hooks | No | Yes: Axum/Tower/Tonic | No | No |
| Denial Metadata | Available tokens only | Retry time + optional state middleware | Retry-after, remaining, reset | Retry-after, remaining, limit, reset | Retry duration | Boolean or wait future |
| Key Cleanup / Eviction | Manual remove |
retain_recent + shrink_to_fit |
cleanup_stale_clients |
TTL on algorithm types | Not keyed | Not keyed |
| Custom Clock | No public clock injection | Yes | Yes | No | Yes | No |
no_std Support |
No | Yes | No | No | Yes | Crate is no_std, but depends on Tokio timing for operation |
Quick Axum Example using quota
use ;
use ;
use ;
type Limiter = ;
async
async
API
We provide 3 essential primitives: standalone Quota, QuotaPolicy, and a GCRA-based QuotaPool.
QuotaPool defaults to QuotaKey, an owned heap String.
If your quota identity is already a compact ID, interned symbol, or another key shape,
use QuotaPool<K> and construct it with QuotaPool::<K>::with_key_type(...).
Use QuotaPool::with_capacity(...) and pool.insert_keys(...) when the key set is known ahead of traffic; that keeps the hot request path on borrowed-key lookup instead of insertion.
Example use of the simple Quota (A simple 8-byte number in memory):
use Quota;
Example use of applying QuotaPolicy with a maximum capacity and RefillRate:
use ;
And now the main QuotaPool:
use ;
use Arc;
use Duration;