ratelock
A minimal, auditable token bucket rate limiter for Rust.
ratelock is designed as a small core crate you can embed anywhere: API gateways, middleware, embedded services, job schedulers, or any hot path where lock contention and allocations hurt.
Repository: https://github.com/cppNexus/ratelock.git
Problem Statement
Most rate-limit crates are convenient, but frequently include one or more of these trade-offs:
- extra dependencies,
- allocation-heavy internals,
- blocking synchronization primitives,
- tight coupling to async runtimes or framework-specific adapters.
ratelock focuses on a small, deterministic kernel that is easy to audit and benchmark.
Design Goals
0heap allocations in steady-state operations.0unsafecode.- lock-free internals (
AtomicU64, noMutex/RwLock). - deterministic refill behavior via pluggable
Clock. no_std-compatible core.- straightforward API:
allow,allow_n,remaining,snapshot.
Non-Goals
- no async waiting/sleeping.
- no fairness queueing.
- no distributed rate limiting.
- no leaky-bucket/sliding-window algorithms.
- no framework adapters (Tower/Axum/etc.) in this crate.
Minimal Example
use RateLimiter;
let limiter = new; // capacity=100, refill=20/sec
if limiter.allow else
Batch usage:
use RateLimiter;
let limiter = new;
assert!;
assert_eq!;
API
RateLimiter::new(capacity, refill_per_sec)(stdfeature only)RateLimiter::with_clock(capacity, refill_per_sec, clock)ShardedRateLimiter::<_, N>::new(capacity, refill_per_sec)(stdfeature only)ShardedRateLimiter::<_, N>::with_clock(capacity, refill_per_sec, clock)allow() -> boolallow_n(n: u64) -> boolremaining() -> u64snapshot() -> Snapshot
Safety and Correctness Guarantees
- No panics in public operations.
- No allocations in limiter operations.
- No unsafe code.
remaining() <= capacityis always maintained.allow_ntoken deduction is atomic: if it returnsfalse, tokens are not deducted.- If a clock returns a timestamp lower than the previous one, refill is skipped (defensive behavior).
no_std
Default features include std.
Build without std:
Use with_clock in no_std mode by providing your own Clock implementation.
Performance
Why is it fast?
The steady-state hot path:
- does not call the clock
- does not perform u128 math
- does not allocate
- performs a single atomic state transition
The hot-path approaches the cost of a single atomic decrement in optimized builds on this hardware and compiler configuration.
Benchmark Summary
Environment: Apple M3 Pro · macOS Tahoe 26.3 · 18 GB RAM · 2026-02-24
| Scenario | Throughput |
|---|---|
| Single-thread hot check | ~520 M ops/s |
| Shared limiter, 4 threads | ~43 M ops/s |
| Shared limiter, 8 threads | ~17 M ops/s |
| Shared limiter, 16 threads | ~9 M ops/s |
| Sharded (64 shards), 4 threads | ~101 M ops/s |
| Sharded (64 shards), 16 threads | ~73 M ops/s |
Full results: see BENCHMARKS.md or open target/criterion/report/index.html after running the suite.
ratelock vs governor
Numbers from the
multi_thread_governor_compare/*benchmark group.governorversion:0.10.1.
| Scenario | ratelock | governor | Speedup |
|---|---|---|---|
| Single-thread hot check | 524.90 M ops/s |
233.21 M ops/s |
2.25× |
| Shared limiter, 4 threads | 41.99 M ops/s |
26.75 M ops/s |
1.57× |
| Shared limiter, 8 threads | 12.37 M ops/s |
11.10 M ops/s |
1.11× |
| Shared limiter, 16 threads | 5.84 M ops/s |
3.61 M ops/s |
1.62× |
Benchmark Modes
Benchmarks are split into four modes:
| Mode | What it measures | Criterion group |
|---|---|---|
hot-path |
Shared limiter with refill_per_sec = 0 (pure atomic contention path) |
single_thread_hot_path/*, multi_thread_hot_path/* |
refill-path |
Shared limiter with refill_per_sec > 0 and manual clock ticks |
single_thread_refill_path/*, multi_thread_refill_path/* |
sharded-hot-path |
N independent limiters selected by hash (reduced contention) |
multi_thread_sharded_hot_path/* |
governor-compare |
Head-to-head overhead comparison vs governor |
single_thread_governor_compare/*, multi_thread_governor_compare/* |
Multi-thread groups are tuned to avoid under-sampling warnings:
sample_size(40)measurement_time(12s)warm_up_time(3s)
Benchmarking Guide
- Run the full benchmark suite:
- Open the Criterion HTML report:
target/criterion/report/index.html
- Run only a specific group or benchmark:
-
For stable numbers, run the same benchmark 3–5 times and compare medians.
-
CLI-only run (without plots):
Benchmark Sanity Notes
- The
~2 nsfigure is the hot path without refill:refill_per_sec = 0, no clock advancement, no refill math, large capacity, fully inlined fast path. This approaches the cost of an atomic decrement plus branch in optimized builds. - Benchmark code uses referenced/shared limiter instances and
black_box(black_box(limiter).allow())to prevent call elision. governorcomparison uses a very high quota (1_000_000_000/s) to benchmark check-path overhead rather than throttling behavior.- Refill-enabled paths are intentionally slower and should be evaluated separately.
- Multi-thread numbers for
multi_thread_hot_path/*andmulti_thread_refill_path/*use one shared limiter and include contention. Thegovernor-comparegroup runs its own thread configuration, so those throughput numbers differ from the rawmulti_thread_hot_path/*results — this is expected.
Contention and Scaling
- Designed for extreme single-instance hot path.
- Under contention, throughput degrades due to shared atomic state.
- For per-key/per-tenant limiting, sharding is the recommended pattern.
Built-in sharding helper:
use ShardedRateLimiter;
const SHARDS: usize = 64;
let limiter = new;
let allowed = limiter.allow_by_key;
assert!;
Comparison (Qualitative)
| Property | ratelock |
|---|---|
| Dependencies | 0 |
| Allocations | 0 |
| Unsafe | 0 |
| no_std core | Yes |
| Lock-free | Yes |
| Built-in sharding | Yes |
| Async waiting | No |
| Framework adapters | No |
Reproducible Quality Checks
Optional:
MIRIFLAGS="-Zmiri-disable-isolation"