# Quota
A high-performance in-memory rate limiter for Rust, using a mix of Leaky Token Bucket & GCRA.
## Quick Comparison
>Benchmark numbers are Criterion-reported ns/op from the local harness (You will find in `./benches/limiters.rs`) which uses 256 Tokio tasks on 16 Tokio worker threads, pre-warmed allow-path state. They are wall-clock elapsed time divided by total operations across concurrent tasks, not single-core instruction latency. To avoid confusion with the numbers, I added the more normal op / s throughput.
| **Same-Key Throughput** | **46.94 ns (21 Mop/s)** | 107.49 ns (9 Mop/s) | 201.96 ns (5 Mop/s) | 112.23 ns (9 Mop/s) | Not keyed | Not keyed |
| **Distributed-Key Throughput** | **1.95 ns (513 Mop/s)** | 7.73 ns (129 Mop/s) | 11.11 ns (90 Mop/s) | 9.04 ns (110 Mop/s) | Not keyed | Not keyed |
| **Single Limiter Throughput** | **47.34 ns (21 Mop/s)** | 74.82 ns (13 Mop/s) | Uses keyed path: 201.96 ns hot | Uses keyed path: 112.23 ns hot | 82.65 ns | 159.45 ns |
| **Refill Interval** | Yes: `set_refill_interval` | No: rate period is GCRA cell spacing, not batch refill ticks | No: only `rate_nanos` + burst tolerance | No: rate/sec + burst; elapsed refill | No: scaled continuous refill | Yes: `refill(...)` + `interval(...)` |
| **Algorithm** | GCRA pool; legacy direct token counter still present | GCRA | GCRA | Token bucket default; custom algorithms | Token bucket | Token/leaky bucket |
| **Keyed Limiting** | Yes | Yes | Yes | Yes | No | No |
| **Direct / Global Limiting** | Yes, via `QuotaPool` single key; legacy `Quota` counter | Yes | No direct type; use constant key | No direct type; use constant key | Yes | Yes |
| **Weighted Costs** | Yes | Yes: `check_n` | No: one request per call | Yes: `check_with_cost` | Yes: `try_wait_n` | Yes: `acquire(n)` / `try_acquire(n)` |
| **Async Wait / Backpressure API** | No | Yes: `until_ready` style APIs | No | Yes: `acquire` and `acquire_timeout` | No; returns retry duration | Yes: `acquire` futures |
| **Nonblocking Try API** | Yes: `check` / `consume` | Yes: `check` / `check_n` | Yes: `check_request` | Yes: `check`, `try_acquire`, `try_acquire_n` | Yes: `try_wait_n` | Yes: `try_acquire` |
| **Built-In Web Middleware** | No | No framework middleware; has stream/sink helpers and middleware hooks | No | Yes: Axum/Tower/Tonic | No | No |
| **Denial Metadata** | Available tokens only | Retry time + optional state middleware | Retry-after, remaining, reset | Retry-after, remaining, limit, reset | Retry duration | Boolean or wait future |
| **Key Cleanup / Eviction** | Manual `remove` | `retain_recent` + `shrink_to_fit` | `cleanup_stale_clients` | TTL on algorithm types | Not keyed | Not keyed |
| **Custom Clock** | No public clock injection | Yes | Yes | No | Yes | No |
| **`no_std` Support** | No | Yes | No | No | Yes | Crate is `no_std`, but depends on Tokio timing for operation |
### Quick Axum Example using `quota`
```rust
use axum::{Router, extract::{Path, State}, http::StatusCode, routing::get};
use quota::{QuotaPolicy, QuotaPool, RefillRate};
use std::{net::SocketAddr, sync::Arc};
type Limiter = Arc<QuotaPool<String>>;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let policy = QuotaPolicy::new()
.set_capacity(10.0)
.set_refill_rate(RefillRate::per_sec(1));
let limiter = Arc::new(QuotaPool::new(policy, 10));
let app = Router::new()
.route("/{key}", get(limit))
.with_state(limiter);
let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
axum::serve(tokio::net::TcpListener::bind(addr).await?, app).await?;
Ok(())
}
async fn limit(State(limiter): State<Limiter>, Path(key): Path<String>) -> StatusCode {
match limiter.consume(key.as_str(), 1) {
Ok(_) => StatusCode::OK,
Err(_) => StatusCode::TOO_MANY_REQUESTS,
}
}
```
### API
We provide 3 essential primitives: standalone `Quota`, `QuotaPolicy`, and a GCRA-based `QuotaPool`.
`QuotaPool` defaults to `QuotaKey`, an owned heap `String`.
If your quota identity is already a compact ID, interned symbol, or another key shape,
use `QuotaPool<K>` and construct it with `QuotaPool::<K>::with_key_type(...)`.
Use `QuotaPool::with_capacity(...)` and `pool.insert_keys(...)` when the key set is known ahead of traffic; that keeps the hot request path on borrowed-key lookup instead of insertion.
Example use of the simple `Quota` (A simple 8-byte number in memory):
```rust
use quota::Quota;
fn main() {
let quota = Quota::with_initial_tokens(10);
let mut results = vec![];
for _ in 0..100 {
results.push(quota.consume(1)); // 10..9..8..7..6..5..4..3..2..1..Err
}
assert_eq!(results.iter().filter(|r| r.is_ok()).count(), 10); // 10 Ok: 10..=1
assert_eq!(results.iter().filter(|r| r.is_err()).count(), 90); // 90 Err: Rate-limited
}
```
Example use of applying `QuotaPolicy` with a maximum capacity and `RefillRate`:
```rust
use quota::{Quota, QuotaPolicy, RefillRate};
fn main() {
let policy = QuotaPolicy::new()
.set_capacity(10.0) // Maximum Capacity to apply to a Quota per tick
.set_refill_rate(RefillRate::per_micro(100.0)); // Refill Rate to apply to a Quota per tick (0.1T/ns)
let quota = Quota::with_initial_tokens(10);
let mut results = vec![];
for _ in 0..100 {
policy.tick(1, &mut quota); // dt = 1ns => 1ns*(0.1T/ns) = 0.1 tokens per tick() call
results.push(quota.consume(1));
}
assert_eq!(results.iter().filter(|r| r.is_ok()).count(), 19);
assert_eq!(results.iter().filter(|r| r.is_err()).count(), 81);
}
```
And now the main `QuotaPool`:
```rust
use quota::{RefillRate, QuotaPolicy, QuotaPool};
use std::sync::Arc;
use std::time::Duration;
fn main() {
let policy = QuotaPolicy::new()
.set_capacity(10.0)
.set_refill_rate(RefillRate::per_sec(3))
.set_refill_interval(Duration::from_secs(1)); // It will not tick until this amount passes between every tick
/// QuotaPool uses the System's own clock and ticks the quotas with the time difference between every tick.
/// A "QuotaPolicy::set_refill_interval" would prevent a tick from happening if internal last_tick_time < refill_interval
let pool = Arc::new(QuotaPool::with_capacity(policy, 10, 1));
let mut results = vec![];
for _ in 0..100 {
results.push(pool.consume("testing", 1));
}
}
```