Crate mailrs_rate_limit

Expand description

§mailrs-rate-limit

Token-bucket rate limiting for Rust services. Single async trait (RateLimitStore) over &str keys with unix-second time, plus a bundled in-process reference implementation (InMemoryRateLimitStore) that runs at sub-µs median on a hot key. Bring your own backend (Redis, DynamoDB, memcached, …) by implementing the trait.

Extracted from mailrs so any Rust service — SMTP, HTTP, gRPC, streaming — can lean on the same token-bucket pattern that fronts a production mail server’s connect path, without inheriting opinions about which storage backend, which clock, or which key shape to use.

§Why not `governor`?

governor is excellent for in-process GCRA rate limiting with quotas-as-types. This crate aims at a different niche:

&str keys, not types. Limit per-IP, per-user, per-API-key, per-endpoint, per-tenant — all from one store, picked at the call site. No type-parameter gymnastics.
Pluggable backend. The trait surface is three methods. Wire your own Redis / Kevy / DynamoDB impl in ~80 lines.
Unix-second time in the trait boundary. Portable across processes, languages, and clocks. Backends that prefer monotonic clocks internally (the bundled in-memory impl does) convert at the boundary.
Pure math exposed. evaluate_bucket is the entire token-bucket arithmetic in one function. Call it directly if you want to plug it into a different store shape without going through the trait.

If you want a single-process limiter with statically-typed quotas, use governor. If you want a swappable storage layer with &str keys and a tiny surface, use this crate.

§Highlights

Backend-free trait. Three async methods (check, cleanup_stale, len). Implementations choose their own storage and concurrency primitive.
Bundled in-memory impl. InMemoryRateLimitStore is DashMap-backed; sub-µs check on a hot key (see BUDGETS.md).
Pure math exposed. evaluate_bucket turns (bucket, now, config) → (next_bucket, allowed) with no I/O, no allocations.
Stringly-typed keys. &str accommodates every key shape (IPs, user IDs, API key prefixes, endpoint paths). One .to_string() at the boundary is the only conversion cost.
Perf-gated. tests/perf_gate.rs asserts on µs-budget latency. Documented derivation in BUDGETS.md.

§Quick start

use mailrs_rate_limit::{InMemoryRateLimitStore, RateLimitStore, TokenBucketConfig};

let limiter = InMemoryRateLimitStore::new(TokenBucketConfig {
    capacity: 10,
    refill_rate: 1.0, // one token per second
});

// Allow the first 10 from this client in a burst...
for _ in 0..10 {
    assert!(limiter.check("203.0.113.42").await);
}
// ...then deny.
assert!(!limiter.check("203.0.113.42").await);

// A different client is unaffected.
assert!(limiter.check("198.51.100.7").await);

// Periodically (e.g. once an hour) drop idle keys to bound memory.
let one_hour_ago = std::time::SystemTime::now()
    .duration_since(std::time::UNIX_EPOCH)
    .unwrap()
    .as_secs()
    .saturating_sub(3600);
limiter.cleanup_stale(one_hour_ago).await;

§Trait shape

#[async_trait]
pub trait RateLimitStore: Send + Sync {
    /// Try to consume one token for `key`. Returns `true` if allowed.
    async fn check(&self, key: &str) -> bool;

    /// Remove buckets that haven't been touched since `before_unix_secs`.
    async fn cleanup_stale(&self, before_unix_secs: u64);

    /// Approximate number of tracked keys (metrics).
    async fn len(&self) -> usize;
}

The trait makes no scheduling promises — call cleanup_stale on whatever cadence makes sense for your traffic shape (once an hour is typical for IP-keyed limiters; faster for sliding-window analytics use cases).

§Bringing your own backend

A minimal Redis backend sketch (illustrative — wire your own error handling and connection pooling):

use async_trait::async_trait;
use mailrs_rate_limit::{evaluate_bucket, Bucket, RateLimitStore, TokenBucketConfig};

pub struct RedisRateLimitStore {
    pool: redis::Pool,
    config: TokenBucketConfig,
}

#[async_trait]
impl RateLimitStore for RedisRateLimitStore {
    async fn check(&self, key: &str) -> bool {
        // 1. WATCH key
        // 2. GET key — parse Bucket from JSON or default to full
        // 3. let (next, allowed) = evaluate_bucket(bucket, now, &self.config);
        // 4. MULTI / SET key / EXEC
        // 5. retry on CAS conflict
        todo!()
    }
    async fn cleanup_stale(&self, _before: u64) { /* Redis EXPIRE handles this natively */ }
    async fn len(&self) -> usize { /* DBSIZE or SCAN */ 0 }
}

§What’s in the box

TokenBucketConfig — burst size + refill rate.
Bucket — single-bucket state (tokens + last-refill timestamp).
evaluate_bucket — pure token-bucket math.
RateLimitStore — async trait for pluggable storage.
InMemoryRateLimitStore — DashMap-backed reference impl.

§What’s NOT in the box (intentionally)

No backends beyond in-memory. Bring your own Redis, Kevy, DynamoDB, memcached impl. The trait is three methods.
No sliding-window log algorithm. Token bucket only. If you need sliding window, implement your own RateLimitStore over the same trait.
No hierarchical buckets (burst + sustained pair). Compose two stores at the call site; the crate does not do composition for you.
No per-key config. One config per store instance. If you need tiers (e.g. auth at 10/min, general at 300/min), instantiate two stores.
No cleanup scheduler. Call cleanup_stale from your own background task; the crate does not spawn anything.
No metrics emission. len() is the hook — wire it into your dashboard.

These boundaries keep the crate tiny, framework-free, and trivial to drop in next to whatever async runtime / DI container / metrics stack you already use.

§Production reference

mailrs itself uses this crate at three call sites in its SMTP / web stack:

Inbound SMTP connect — capacity 10, refill 1/sec. Throttles abusive connect rates per source IP.
Web auth endpoints — capacity 10, refill ~0.17/sec (10/minute). Slows brute-force password attacks.
Web general API — capacity 300, refill 5/sec (300/minute). Backstop against runaway clients.

All three sit on top of the same InMemoryRateLimitStore shape, keyed by addr.ip().to_string().

§Tested

1.0.0 ships 28 unit tests + 8 trait-contract tests + 4 perf gates across the three modules:

Module	Tests	Surface
`config`	3	Default values, clone, zero-refill validity
`token_bucket`	11	Allow / reject / refill / cap / drain / clock backwards / boundary timestamps
`in_memory`	14	Sync + async API, per-key isolation, cleanup, concurrent access
`trait_contract`	8	Suite that any `RateLimitStore` impl should pass
`perf_gate`	4	`evaluate_bucket` + `check` (hot/cold) µs-budget

Run with cargo test -p mailrs-rate-limit.

§Performance

Two layers of measurement:

tests/perf_gate.rs — integration tests that fail CI if any hot path slows past its budget. See BUDGETS.md for the full table.
benches/store.rs — criterion microbenchmarks for detailed regression tracking.

Numbers below are criterion medians, measured with criterion 0.8 on Apple Silicon (M-series), release profile:

Operation	Median	Notes
`evaluate_bucket` (allowed, pure math)	~1.8 ns	branchless refill + decrement
`evaluate_bucket` (denied, no refill)	~2.0 ns	exits without writing tokens
`InMemoryRateLimitStore::check_sync` (hot key)	13-16 ns	`quanta::Clock` + AtomicU64 GCRA-encoded TAT + DashMap shard read lock
`InMemoryRateLimitStore::check` (async, hot key)	~85 ns	+ boxed-future overhead from `async-trait`
`InMemoryRateLimitStore::check_sync` (cold key, first touch)	~280 ns	String alloc + DashMap insert
`cleanup_stale(10k entries, all stale)`	~119 µs	full sweep, retain everything-or-nothing
`cleanup_stale(10k entries, none stale)`	~119 µs	same cost — retain still touches every entry

Run with cargo bench -p mailrs-rate-limit.

§Vs. `governor` 0.10 (the de-facto Rust GCRA crate)

As of 1.0.3 we match or slightly beat governor on the hot path. The earlier 2× governor lead was three pieces of open-source homework we hadn’t done — adopted in 1.0.3, all credited to governor’s source:

Operation	mailrs-rate-limit 1.0.3	governor 0.10
hot key, allowed	13-16 ns	14-18 ns
cold key first-touch	275-372 ns	290-420 ns

The three things that closed the gap: (1) per-key state collapsed into a single AtomicU64 holding a GCRA-style TAT — DashMap shard read lock + lock-free CAS instead of write-locked entry mutation; (2) quanta::Clock for the time source — u64-backed monotonic Instant without going through Duration (~3-5 ns vs SystemTime::now()’s ~10 ns); (3) nanos_per_token + burst_nanos precomputed at construction so the hot path is integer arithmetic only.

Token-bucket semantics (capacity + refill_rate config) are preserved — GCRA’s TAT is just a more compact encoding of the same state for a uniform-rate bucket. See crates/rate-limit/src/in_memory.rs for the implementation; reproduce numbers with cargo bench -p mailrs-rate-limit --bench compare_governor.

§Versioning

1.x follows semver. The stable public surface:

RateLimitStore trait method signatures
TokenBucketConfig struct shape
Bucket struct shape (for backend authors)
evaluate_bucket function signature + algorithm
InMemoryRateLimitStore::new signature

The pure-math evaluate_bucket algorithm is frozen at 1.x — any algorithm change is a major-version bump.

§Stone audit (v3 cycle, 2026-05-25)

Axis	Status
doc	✅ clean (`cargo doc --no-deps -p mailrs-rate-limit`)
test	line cov: 98.6% (`cargo llvm-cov -p mailrs-rate-limit --summary-only`)
bench	✅ 2 file(s) criterion + ✅ 3 gate(s) `perf_gate.rs`
size	release rlib: 121 KB
fuzz	❌ none
mem	dhat profile pending (v3.4 backlog)

§Competitor comparisons (from PERFORMANCE.md)

mailrs-rate-limit vs governor 0.10 (DashMap-backed)

§License

Licensed under either Apache License, Version 2.0 or MIT license at your option.

Token-bucket rate limiting for Rust services.

See the crate-level README for design rationale and worked examples.

Re-exports§

pub use config::TokenBucketConfig;
pub use in_memory::InMemoryRateLimitStore;
pub use store::RateLimitStore;
pub use token_bucket::Bucket;
pub use token_bucket::evaluate_bucket;

Modules§

config: Token-bucket configuration.
in_memory: InMemoryRateLimitStore — DashMap-backed RateLimitStore.
store: RateLimitStore trait — pluggable storage for token-bucket state.
token_bucket: Pure token-bucket arithmetic.