A token bucket is simple to get working and surprisingly hard to get right — most implementations leak performance to a lock, leak correctness under contention, or leak ergonomics behind a generic builder. better-bucket targets all three at once:
- Lock-free acquire. A single
compare_exchange_weakon a packed(tokens, last_refill_tick)word. NoMutex, noRwLock, no parking on the hot path. - Allocation-free steady state. Acquiring never allocates. A bucket is a small, cache-line-aligned value with no heap tail.
- Lazy refill. Tokens accrue from elapsed monotonic time, computed on access. No timer thread burning a core, no wakeups, no watts spent while idle.
- Overflow-safe. Every refill and capacity computation is checked or saturating. A hostile request count or a multi-day idle gap can't wrap the counter or over-fill the bucket.
- Never over-grants. The core safety invariant, proven under
loomandproptest. - One-line API. The 80% case is a constructor and a method call. No ceremony.
Features
- Token bucket core — lock-free
try_acquire/acquire(onecompare_exchange_weakon a packed atomic word), allocation-free, cache-line aligned to avoid false sharing between independent buckets - Lazy refill — tokens accrue from monotonic elapsed time on access; no background threads, no timers
- Overflow-safe math — checked / saturating arithmetic on every refill and capacity path
- Deterministic tests — inject a mockable clock (via
clock-lib) and advance time withoutsleep - Tier-1 API —
Bucket::per_second(n)/Bucket::per_duration(n, dur)for the common case;BucketConfigfor full control; a trait for the 1% - No over-grant guarantee — verified with
loommodel checking, an allocation audit, a multi-thread stress test, andproptest - Zero
unsafeon the public path
Installation
Add to your Cargo.toml:
[]
= "0.6"
# no_std build (no clock-lib; exposes only VERSION today — see Feature Flags):
= { = "0.6", = false }
Quick Start
use Bucket;
// 100 tokens per second, bucket holds up to 100.
let bucket = per_second;
// The 80% case: one call. Returns true if a token was available.
if bucket.try_acquire else
That is the whole common case. No builder, no type parameters, no setup.
Configured Buckets (Tier 2)
When you need control over capacity, refill rate, and initial fill independently — for example a large burst ceiling that refills slowly, or a bucket that starts empty — use the builder:
use Bucket;
use Duration;
// 500-token burst ceiling, refilling 100 tokens/second, starting empty.
let bucket = builder
.capacity
.refill
.initial
.build
.expect;
// Try to take 10 tokens at once.
if bucket.try_acquire
// How many are available right now (after lazy refill).
let left = bucket.available;
build() validates the configuration (rejecting zero capacity, zero refill
amount, or zero refill period with a [BucketError]), so an invalid bucket can
never be constructed. For a custom time source, chain .with_clock(...) onto the
built bucket. If you prefer to build the config value yourself, BucketConfig::new
plus Bucket::from_config is the same path without the fluent surface.
Deterministic Testing (mockable clock)
Time-driven code is normally a pain to test — you end up sprinkling sleep
through the suite and hoping. better-bucket lets you inject a manual clock
from clock-lib and advance time
instantly:
use Bucket;
use ManualClock;
use Arc;
use Duration;
// Share one clock between the test and the bucket via `Arc`.
let clock = new;
let bucket = per_second.with_clock;
// Drain the bucket.
assert!;
assert!; // empty
// Advance one second — no real sleep, fully deterministic.
clock.advance;
assert!; // refilled
Design
Lock-free, allocation-free hot path
The bucket packs its mutable state — current tokens and the last-refill
tick — into a single atomic word. try_acquire is a compare_exchange_weak
loop:
- Load the packed word.
- Compute lazy refill from monotonic elapsed time (saturating).
- If enough tokens, CAS the new
(tokens - n, now_tick)in place. - On CAS failure (another thread won the race), retry with bounded backoff.
There is no lock, no allocation, and no syscall on the success path beyond the monotonic clock read. Independent buckets sit on their own cache lines, so unrelated limiters never falsely share.
Lazy refill, no timer thread
Refill is never pushed by a background thread. Tokens are computed from the
elapsed monotonic time at the moment you call try_acquire / available.
An idle bucket costs nothing — no wakeups, no spinning, no watts.
The no-over-grant invariant
The defining correctness property: across any concurrent interleaving, the total tokens granted never exceed capacity plus the tokens legitimately accrued by refill. This is the property that separates a correct rate limiter from a leaky one, and it is verified two ways:
loomexhaustively explores the CAS interleavings of concurrenttry_acquirecalls and asserts no lost update and no over-grant.- A multi-thread stress test hammers one bucket from many threads and asserts the total granted never exceeds the available tokens.
- An allocation audit runs the acquire path under a counting allocator and asserts zero allocations.
proptestthrows arbitrary sequences of acquires and time advances at the bucket and asserts tokens always stay in[0, capacity]and grants never exceed what refill allows.
Packed state and its limits
State is one AtomicU64: the upper 32 bits hold tokens in millitokens (for
sub-token refill resolution), the lower 32 bits hold milliseconds since the
bucket was created. Two consequences follow from that budget:
- Capacity tops out around 4.29 million tokens (
u32::MAXmillitokens). That is an enormous burst ceiling for rate limiting; larger requests are clamped to it. - The millisecond counter saturates after ~49.7 days of clock advance, after
which refill stalls.
Bucket::reset()re-anchors it (and refills to full), so a process that runs longer than that between resets can call it periodically.
The acquire path is division-free: the refill rate is precomputed at
construction, so the hot path is one packed-word load, a multiply-and-shift, and
a CAS. On a Ryzen 9 9950X3D the bucket's own accounting measures ~6 ns
(isolated with a mock clock). A real try_acquire adds one monotonic clock read
on top — the dominant cost — for a single-thread figure of ~24 ns, most of
it the Instant::now() call rather than the bucket. Contended throughput scales
with threads; the lock-free CAS has no lock to serialize on.
vs governor
On the same monotonic clock, better-bucket and governor are tied
(~24 vs ~23 ns, both bounded by the clock read). The bucket's algorithm is at
least as lean — with a cheap clock it runs in ~6 ns, edging governor on its
fast quanta clock (~7 ns). Out of the box, governor is faster end-to-end
(~7 ns) purely because its default quanta clock beats the Instant clock
better-bucket reads through clock-lib — a clock difference, not an algorithm
one. Full numbers, method, and machine details are in
docs/BENCHMARKS.md.
Feature Flags
| Feature | Default | Description |
|---|---|---|
std |
✅ | Standard library. Off → no_std. |
clock |
✅ | Pluggable clock-lib time source: monotonic clock + mockable clock for tests. Implies std (clock-lib's Clock is std-gated). |
# no_std build (no clock-lib):
= { = "0.6", = false }
The lock-free accounting core uses only
coreatomics and isno_std-capable in principle, but the shippedBucketconstructors read time fromclock-liband therefore require the defaultclockfeature (which impliesstd). A bareno_stdbuild currently exposes only the crate'sVERSION; a caller-driven, clock-free time API is a candidate for a future release.
Cross-Platform Support
Tier 1 Support:
- ✅ Linux (x86_64, aarch64)
- ✅ macOS (x86_64, Apple Silicon)
- ✅ Windows (x86_64)
Behavior is identical across all three; the CI matrix runs every target on stable and MSRV. A commit that breaks any platform is a broken commit.
Testing
# Unit + integration + property tests
# Concurrency model checking (no over-grant under interleaving)
RUSTFLAGS="--cfg loom"
# Benchmarks
# Format + lints (must be clean)
Where It Fits
better-bucket is the single-purpose home for token-bucket math in the
wider library ecosystem. It is consumed by
rate-net — a multi-algorithm,
per-key rate limiter — which uses this crate as its token-bucket strategy
rather than reimplementing the algorithm. better-bucket stays
foreign-compatible: it works perfectly well on its own, with no obligation to
pull in the rest of the family.
Contributing
Contributions are welcome. Before opening a PR, make sure cargo fmt,
cargo clippy --all-targets --all-features -- -D warnings, and
cargo test --all-features are all clean, and that any change touching the
acquire path is accompanied by a benchmark and (where it affects concurrency)
a loom test.