quota
A high-performance in-memory rate limiter for Rust, using a mix of Leaky Token Bucket & GCRA.
The main QuotaPool is concurrent by default. It uses DashMap for key routing and a small per-key mutex for exact quota state, so it can be shared as Arc<QuotaPool<_>> without an outer lock.
We provide 3 essential primitives: Quota, QuotaPolicy, and QuotaPool that combines both of them.
QuotaPool defaults to QuotaKey, an owned heap String.
If your quota identity is already a compact ID, interned symbol, or another key shape,
use QuotaPool<K> and construct it with QuotaPool::<K>::with_key_type(...).
Use QuotaPool::with_capacity(...) and pool.insert_keys(...) when the key set is known ahead of traffic; that keeps the hot request path on borrowed-key lookup instead of insertion.
Example use of the simple Quota (A simple 8-byte number in memory):
use Quota;
Example use of applying QuotaPolicy with a maximum capacity and RefillRate:
use ;
And now the main QuotaPool:
use ;
use Arc;
use Duration;
Axum Example
use ;
use ;
use ;
type Limiter = ;
async
async
Technical
This library's QuotaPool specifically (the most complex primitive) has been tested in countless different configurations: MPSC methods (across different runtimes), Global Std Mutex, to ShardedParking Mutex, to thread-routing, and finally to Dashmap + a per-quota lock.
This benchmark measures the QuotaPool::consume method in single-key performance (Hot)
and multi-key performance (Spread) using various attempted implementations:
mode hot ns/op hot Mops/s spread ns/op spread Mops/s
global_std_mutex 209.14 4.78 267.92 3.73
dash_entry_lock 613.79 1.63 8.39 119.12
sharded_parking_mutex 814.90 1.23 49.92 20.03
mpsc_std_sync_channel 1589.35 0.63 1660.51 0.60
mpsc_flume_unbounded 1615.93 0.62 1888.52 0.53
owner_thread_routing 1641.45 0.61 965.08 1.04
mpsc_crossbeam_unbounded 1673.91 0.60 1663.72 0.60
mpsc_std_channel 1686.16 0.59 1743.96 0.57
mpsc_flume_bounded 1696.92 0.59 1625.05 0.62
mpsc_tokio_unbounded 1764.91 0.57 1682.05 0.59
mpsc_crossbeam_bounded 1767.41 0.57 1622.89 0.62
mpsc_tokio_bounded_blocking 2110.07 0.47 2024.87 0.49
MPSC is clearly out of this discussion. I also attempted an "Atomic" per quota, but since Refilling + Consuming was not exactly an atomic operation, I had to do a multi-atomic structure.
(re-ran benchmark with considered methods)
mode hot ns/op hot Mops/s spread ns/op spread Mops/s
atomic_multi_relaxed_refill 156.97 6.37 116.87 8.56
global_std_mutex 218.57 4.58 259.62 3.85
dash_entry_lock 461.07 2.17 9.25 108.10
sharded_parking_mutex 685.26 1.46 43.83 22.81
owner_thread_routing 2238.35 0.45 973.36 1.03
Multi-atomic was very stable in both hot/spread.
But "Hot" is not a realistic workload. Because real-world traffic is skewed and distributed, Spread performance is the only thing that actually matters here.
This is exactly why we chose Dash + Entry Lock for quota's QuotaPool. It's orders of magnitudes faster than our hottest implementation, and unless more than 26.1% of your requests are hitting the same exact key (break-even), Dashmap + Entry Lock is technically the superior choice here for its traffic-spread performance.
In some cases, that 'blazingly fast' spread traffic performance (12x than atomic performance) can tank certain attacks: e.g. If your API is getting DdoS'd, and you are rate-limiting by IP, you are going to get attacked from multiple other IPs as well, not one. It's common for DdoS attacks to come from multi-IP botnets. Same thing goes if you are rate-limiting malicious user IDs, you're likely to get attacked by multiple other user IDs as well, either by the same individual or your service generally needs protection from countless traffic by actual users.
Since quota sits at the application layer, you can do more nuanced domain-specific rate-limiting than what a firewall or even a request gateway would.