quota 0.3.0

Fastest lane-parellel rate-limiter for key-distributed traffic workloads
Documentation

Quota

A high-performance in-memory rate limiter for Rust, using a mix of Leaky Token Bucket & GCRA.

Quick Comparison

Benchmark numbers are Criterion-reported ns/op from the local harness using 256 Tokio tasks on 16 Tokio worker threads, pre-warmed allow-path state. They are wall-clock elapsed time divided by total operations across concurrent tasks, not single-core instruction latency. Just noting this to avoid confusion from the numbers!

Axis quota governor flux_limiter tokio_rate_limit ratelimit leaky_bucket
Same-Key Throughput 46.94 ns 107.49 ns 201.96 ns 112.23 ns Not keyed Not keyed
Distributed-Key Throughput 1.95 ns 7.73 ns 11.11 ns 9.04 ns Not keyed Not keyed
Single Limiter Throughput 47.34 ns 74.82 ns Uses keyed path: 201.96 ns hot Uses keyed path: 112.23 ns hot 82.65 ns 159.45 ns
Refill Interval Yes: set_refill_interval No: rate period is GCRA cell spacing, not batch refill ticks No: only rate_nanos + burst tolerance No: rate/sec + burst; elapsed refill No: scaled continuous refill Yes: refill(...) + interval(...)
Algorithm GCRA pool; legacy direct token counter still present GCRA GCRA Token bucket default; custom algorithms Token bucket Token/leaky bucket
Keyed Limiting Yes Yes Yes Yes No No
Direct / Global Limiting Yes, via QuotaPool single key; legacy Quota counter Yes No direct type; use constant key No direct type; use constant key Yes Yes
Weighted Costs Yes Yes: check_n No: one request per call Yes: check_with_cost Yes: try_wait_n Yes: acquire(n) / try_acquire(n)
Async Wait / Backpressure API No Yes: until_ready style APIs No Yes: acquire and acquire_timeout No; returns retry duration Yes: acquire futures
Nonblocking Try API Yes: check / consume Yes: check / check_n Yes: check_request Yes: check, try_acquire, try_acquire_n Yes: try_wait_n Yes: try_acquire
Built-In Web Middleware No No framework middleware; has stream/sink helpers and middleware hooks No Yes: Axum/Tower/Tonic No No
Denial Metadata Available tokens only Retry time + optional state middleware Retry-after, remaining, reset Retry-after, remaining, limit, reset Retry duration Boolean or wait future
Key Cleanup / Eviction Manual remove retain_recent + shrink_to_fit cleanup_stale_clients TTL on algorithm types Not keyed Not keyed
Custom Clock No public clock injection Yes Yes No Yes No
no_std Support No Yes No No Yes Crate is no_std, but depends on Tokio timing for operation

Quick Axum Example using quota

use axum::{Router, extract::{Path, State}, http::StatusCode, routing::get};
use quota::{QuotaPolicy, QuotaPool, RefillRate};
use std::{net::SocketAddr, sync::Arc};

type Limiter = Arc<QuotaPool<String>>;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let policy = QuotaPolicy::new()
        .set_capacity(10.0)
        .set_refill_rate(RefillRate::per_sec(1));
    
    let limiter = Arc::new(QuotaPool::new(policy, 10));

    let app = Router::new()
        .route("/{key}", get(limit))
        .with_state(limiter);

    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
    axum::serve(tokio::net::TcpListener::bind(addr).await?, app).await?;
    Ok(())
}

async fn limit(State(limiter): State<Limiter>, Path(key): Path<String>) -> StatusCode {
    match limiter.consume(key.as_str(), 1) {
        Ok(_) => StatusCode::OK,
        Err(_) => StatusCode::TOO_MANY_REQUESTS,
    }
}

API

We provide 3 essential primitives: standalone Quota, QuotaPolicy, and a GCRA-based QuotaPool.

QuotaPool defaults to QuotaKey, an owned heap String. If your quota identity is already a compact ID, interned symbol, or another key shape, use QuotaPool<K> and construct it with QuotaPool::<K>::with_key_type(...). Use QuotaPool::with_capacity(...) and pool.insert_keys(...) when the key set is known ahead of traffic; that keeps the hot request path on borrowed-key lookup instead of insertion.

Example use of the simple Quota (A simple 8-byte number in memory):

use quota::Quota;

fn main() {
    let quota = Quota::with_initial_tokens(10);

    let mut results = vec![];
    for _ in 0..100 {
        results.push(quota.consume(1)); // 10..9..8..7..6..5..4..3..2..1..Err
    }

    assert_eq!(results.iter().filter(|r| r.is_ok()).count(), 10); // 10 Ok: 10..=1
    assert_eq!(results.iter().filter(|r| r.is_err()).count(), 90); // 90 Err: Rate-limited
}

Example use of applying QuotaPolicy with a maximum capacity and RefillRate:

use quota::{Quota, QuotaPolicy, RefillRate};

fn main() {
    let policy = QuotaPolicy::new()
        .set_capacity(10.0) // Maximum Capacity to apply to a Quota per tick
        .set_refill_rate(RefillRate::per_micro(100.0)); // Refill Rate to apply to a Quota per tick (0.1T/ns)

    let quota = Quota::with_initial_tokens(10);

    let mut results = vec![];
    for _ in 0..100 {
        policy.tick(1, &mut quota); // dt = 1ns => 1ns*(0.1T/ns) = 0.1 tokens per tick() call
        results.push(quota.consume(1));
    }

    assert_eq!(results.iter().filter(|r| r.is_ok()).count(), 19);
    assert_eq!(results.iter().filter(|r| r.is_err()).count(), 81);
}

And now the main QuotaPool:

use quota::{RefillRate, QuotaPolicy, QuotaPool};
use std::sync::Arc;
use std::time::Duration;

fn main() {
    let policy = QuotaPolicy::new()
        .set_capacity(10.0)
        .set_refill_rate(RefillRate::per_sec(3))
        .set_refill_interval(Duration::from_secs(1)); // It will not tick until this amount passes between every tick

    /// QuotaPool uses the System's own clock and ticks the quotas with the time difference between every tick.
    /// A "QuotaPolicy::set_refill_interval" would prevent a tick from happening if internal last_tick_time < refill_interval
    let pool = Arc::new(QuotaPool::with_capacity(policy, 10, 1));

    let mut results = vec![];
    for _ in 0..100 {
        results.push(pool.consume("testing", 1));
    }
}