Quota

A high-performance in-memory rate limiter for Rust, using a mix of Leaky Token Bucket & GCRA.

Quick Comparison

Benchmark numbers are Criterion-reported ns/op from the local harness using 256 Tokio tasks on 16 Tokio worker threads, pre-warmed allow-path state. They are wall-clock elapsed time divided by total operations across concurrent tasks, not single-core instruction latency. Just noting this to avoid confusion from the numbers!

Axis	`quota`	`governor`	`flux_limiter`	`tokio_rate_limit`	`ratelimit`	`leaky_bucket`
Same-Key Throughput	46.94 ns	107.49 ns	201.96 ns	112.23 ns	Not keyed	Not keyed
Distributed-Key Throughput	1.95 ns	7.73 ns	11.11 ns	9.04 ns	Not keyed	Not keyed
Single Limiter Throughput	47.34 ns	74.82 ns	Uses keyed path: 201.96 ns hot	Uses keyed path: 112.23 ns hot	82.65 ns	159.45 ns
Refill Interval	Yes: `set_refill_interval`	No: rate period is GCRA cell spacing, not batch refill ticks	No: only `rate_nanos` + burst tolerance	No: rate/sec + burst; elapsed refill	No: scaled continuous refill	Yes: `refill(...)` + `interval(...)`
Algorithm	GCRA pool; legacy direct token counter still present	GCRA	GCRA	Token bucket default; custom algorithms	Token bucket	Token/leaky bucket
Keyed Limiting	Yes	Yes	Yes	Yes	No	No
Direct / Global Limiting	Yes, via `QuotaPool` single key; legacy `Quota` counter	Yes	No direct type; use constant key	No direct type; use constant key	Yes	Yes
Weighted Costs	Yes	Yes: `check_n`	No: one request per call	Yes: `check_with_cost`	Yes: `try_wait_n`	Yes: `acquire(n)` / `try_acquire(n)`
Async Wait / Backpressure API	No	Yes: `until_ready` style APIs	No	Yes: `acquire` and `acquire_timeout`	No; returns retry duration	Yes: `acquire` futures
Nonblocking Try API	Yes: `check` / `consume`	Yes: `check` / `check_n`	Yes: `check_request`	Yes: `check`, `try_acquire`, `try_acquire_n`	Yes: `try_wait_n`	Yes: `try_acquire`
Built-In Web Middleware	No	No framework middleware; has stream/sink helpers and middleware hooks	No	Yes: Axum/Tower/Tonic	No	No
Denial Metadata	Available tokens only	Retry time + optional state middleware	Retry-after, remaining, reset	Retry-after, remaining, limit, reset	Retry duration	Boolean or wait future
Key Cleanup / Eviction	Manual `remove`	`retain_recent` + `shrink_to_fit`	`cleanup_stale_clients`	TTL on algorithm types	Not keyed	Not keyed
Custom Clock	No public clock injection	Yes	Yes	No	Yes	No
`no_std` Support	No	Yes	No	No	Yes	Crate is `no_std`, but depends on Tokio timing for operation

Quick Axum Example using `quota`

use axum::{Router, extract::{Path, State}, http::StatusCode, routing::get};
use quota::{QuotaPolicy, QuotaPool, RefillRate};
use std::{net::SocketAddr, sync::Arc};

type Limiter = Arc<QuotaPool<String>>;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let policy = QuotaPolicy::new()
        .set_capacity(10.0)
        .set_refill_rate(RefillRate::per_sec(1));
    
    let limiter = Arc::new(QuotaPool::new(policy, 10));

    let app = Router::new()
        .route("/{key}", get(limit))
        .with_state(limiter);

    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
    axum::serve(tokio::net::TcpListener::bind(addr).await?, app).await?;
    Ok(())
}

async fn limit(State(limiter): State<Limiter>, Path(key): Path<String>) -> StatusCode {
    match limiter.consume(key.as_str(), 1) {
        Ok(_) => StatusCode::OK,
        Err(_) => StatusCode::TOO_MANY_REQUESTS,
    }
}

API

We provide 3 essential primitives: standalone Quota, QuotaPolicy, and a GCRA-based QuotaPool.

QuotaPool defaults to QuotaKey, an owned heap String. If your quota identity is already a compact ID, interned symbol, or another key shape, use QuotaPool<K> and construct it with QuotaPool::<K>::with_key_type(...). Use QuotaPool::with_capacity(...) and pool.insert_keys(...) when the key set is known ahead of traffic; that keeps the hot request path on borrowed-key lookup instead of insertion.

Example use of the simple Quota (A simple 8-byte number in memory):

use quota::Quota;

fn main() {
    let quota = Quota::with_initial_tokens(10);

    let mut results = vec![];
    for _ in 0..100 {
        results.push(quota.consume(1)); // 10..9..8..7..6..5..4..3..2..1..Err
    }

    assert_eq!(results.iter().filter(|r| r.is_ok()).count(), 10); // 10 Ok: 10..=1
    assert_eq!(results.iter().filter(|r| r.is_err()).count(), 90); // 90 Err: Rate-limited
}

Example use of applying QuotaPolicy with a maximum capacity and RefillRate:

use quota::{Quota, QuotaPolicy, RefillRate};

fn main() {
    let policy = QuotaPolicy::new()
        .set_capacity(10.0) // Maximum Capacity to apply to a Quota per tick
        .set_refill_rate(RefillRate::per_micro(100.0)); // Refill Rate to apply to a Quota per tick (0.1T/ns)

    let quota = Quota::with_initial_tokens(10);

    let mut results = vec![];
    for _ in 0..100 {
        policy.tick(1, &mut quota); // dt = 1ns => 1ns*(0.1T/ns) = 0.1 tokens per tick() call
        results.push(quota.consume(1));
    }

    assert_eq!(results.iter().filter(|r| r.is_ok()).count(), 19);
    assert_eq!(results.iter().filter(|r| r.is_err()).count(), 81);
}

And now the main QuotaPool:

use quota::{RefillRate, QuotaPolicy, QuotaPool};
use std::sync::Arc;
use std::time::Duration;

fn main() {
    let policy = QuotaPolicy::new()
        .set_capacity(10.0)
        .set_refill_rate(RefillRate::per_sec(3))
        .set_refill_interval(Duration::from_secs(1)); // It will not tick until this amount passes between every tick

    /// QuotaPool uses the System's own clock and ticks the quotas with the time difference between every tick.
    /// A "QuotaPolicy::set_refill_interval" would prevent a tick from happening if internal last_tick_time < refill_interval
    let pool = Arc::new(QuotaPool::with_capacity(policy, 10, 1));

    let mut results = vec![];
    for _ in 0..100 {
        results.push(pool.consume("testing", 1));
    }
}

quota 0.3.0

Quota

Quick Comparison

Quick Axum Example using quota

API

Quick Axum Example using `quota`