better-bucket 0.6.0

A token bucket is simple to get working and surprisingly hard to get right — most implementations leak performance to a lock, leak correctness under contention, or leak ergonomics behind a generic builder. better-bucket targets all three at once:

Lock-free acquire. A single compare_exchange_weak on a packed (tokens, last_refill_tick) word. No Mutex, no RwLock, no parking on the hot path.
Allocation-free steady state. Acquiring never allocates. A bucket is a small, cache-line-aligned value with no heap tail.
Lazy refill. Tokens accrue from elapsed monotonic time, computed on access. No timer thread burning a core, no wakeups, no watts spent while idle.
Overflow-safe. Every refill and capacity computation is checked or saturating. A hostile request count or a multi-day idle gap can't wrap the counter or over-fill the bucket.
Never over-grants. The core safety invariant, proven under loom and proptest.
One-line API. The 80% case is a constructor and a method call. No ceremony.

Features

Token bucket core — lock-free try_acquire / acquire (one compare_exchange_weak on a packed atomic word), allocation-free, cache-line aligned to avoid false sharing between independent buckets
Lazy refill — tokens accrue from monotonic elapsed time on access; no background threads, no timers
Overflow-safe math — checked / saturating arithmetic on every refill and capacity path
Deterministic tests — inject a mockable clock (via clock-lib) and advance time without sleep
Tier-1 API — Bucket::per_second(n) / Bucket::per_duration(n, dur) for the common case; BucketConfig for full control; a trait for the 1%
No over-grant guarantee — verified with loom model checking, an allocation audit, a multi-thread stress test, and proptest
Zero unsafe on the public path

Installation

Add to your Cargo.toml:

[dependencies]
better-bucket = "0.6"

# no_std build (no clock-lib; exposes only VERSION today — see Feature Flags):
better-bucket = { version = "0.6", default-features = false }

Quick Start

use better_bucket::Bucket;

// 100 tokens per second, bucket holds up to 100.
let bucket = Bucket::per_second(100);

// The 80% case: one call. Returns true if a token was available.
if bucket.try_acquire(1) {
    // allowed — do the work
} else {
    // denied — shed load / return 429 / back off
}

That is the whole common case. No builder, no type parameters, no setup.

Configured Buckets (Tier 2)

When you need control over capacity, refill rate, and initial fill independently — for example a large burst ceiling that refills slowly, or a bucket that starts empty — use the builder:

use better_bucket::Bucket;
use std::time::Duration;

// 500-token burst ceiling, refilling 100 tokens/second, starting empty.
let bucket = Bucket::builder()
    .capacity(500)
    .refill(100, Duration::from_secs(1))
    .initial(0)
    .build()
    .expect("valid configuration");

// Try to take 10 tokens at once.
if bucket.try_acquire(10) {
    // allowed
}

// How many are available right now (after lazy refill).
let left = bucket.available();

build() validates the configuration (rejecting zero capacity, zero refill amount, or zero refill period with a [BucketError]), so an invalid bucket can never be constructed. For a custom time source, chain .with_clock(...) onto the built bucket. If you prefer to build the config value yourself, BucketConfig::new plus Bucket::from_config is the same path without the fluent surface.

Deterministic Testing (mockable clock)

Time-driven code is normally a pain to test — you end up sprinkling sleep through the suite and hoping. better-bucket lets you inject a manual clock from clock-lib and advance time instantly:

use better_bucket::Bucket;
use clock_lib::ManualClock;
use std::sync::Arc;
use std::time::Duration;

// Share one clock between the test and the bucket via `Arc`.
let clock = Arc::new(ManualClock::new());
let bucket = Bucket::per_second(10).with_clock(Arc::clone(&clock));

// Drain the bucket.
assert!(bucket.try_acquire(10));
assert!(!bucket.try_acquire(1)); // empty

// Advance one second — no real sleep, fully deterministic.
clock.advance(Duration::from_secs(1));
assert!(bucket.try_acquire(10)); // refilled

Design

Lock-free, allocation-free hot path

The bucket packs its mutable state — current tokens and the last-refill tick — into a single atomic word. try_acquire is a compare_exchange_weak loop:

Load the packed word.
Compute lazy refill from monotonic elapsed time (saturating).
If enough tokens, CAS the new (tokens - n, now_tick) in place.
On CAS failure (another thread won the race), retry with bounded backoff.

There is no lock, no allocation, and no syscall on the success path beyond the monotonic clock read. Independent buckets sit on their own cache lines, so unrelated limiters never falsely share.

Lazy refill, no timer thread

Refill is never pushed by a background thread. Tokens are computed from the elapsed monotonic time at the moment you call try_acquire / available. An idle bucket costs nothing — no wakeups, no spinning, no watts.

The no-over-grant invariant

The defining correctness property: across any concurrent interleaving, the total tokens granted never exceed capacity plus the tokens legitimately accrued by refill. This is the property that separates a correct rate limiter from a leaky one, and it is verified two ways:

loom exhaustively explores the CAS interleavings of concurrent try_acquire calls and asserts no lost update and no over-grant.
A multi-thread stress test hammers one bucket from many threads and asserts the total granted never exceeds the available tokens.
An allocation audit runs the acquire path under a counting allocator and asserts zero allocations.
proptest throws arbitrary sequences of acquires and time advances at the bucket and asserts tokens always stay in [0, capacity] and grants never exceed what refill allows.

Packed state and its limits

State is one AtomicU64: the upper 32 bits hold tokens in millitokens (for sub-token refill resolution), the lower 32 bits hold milliseconds since the bucket was created. Two consequences follow from that budget:

Capacity tops out around 4.29 million tokens (u32::MAX millitokens). That is an enormous burst ceiling for rate limiting; larger requests are clamped to it.
The millisecond counter saturates after ~49.7 days of clock advance, after which refill stalls. Bucket::reset() re-anchors it (and refills to full), so a process that runs longer than that between resets can call it periodically.

The acquire path is division-free: the refill rate is precomputed at construction, so the hot path is one packed-word load, a multiply-and-shift, and a CAS. On a Ryzen 9 9950X3D the bucket's own accounting measures ~6 ns (isolated with a mock clock). A real try_acquire adds one monotonic clock read on top — the dominant cost — for a single-thread figure of ~24 ns, most of it the Instant::now() call rather than the bucket. Contended throughput scales with threads; the lock-free CAS has no lock to serialize on.

vs `governor`

On the same monotonic clock, better-bucket and governor are tied (~24 vs ~23 ns, both bounded by the clock read). The bucket's algorithm is at least as lean — with a cheap clock it runs in ~6 ns, edging governor on its fast quanta clock (~7 ns). Out of the box, governor is faster end-to-end (~7 ns) purely because its default quanta clock beats the Instant clock better-bucket reads through clock-lib — a clock difference, not an algorithm one. Full numbers, method, and machine details are in docs/BENCHMARKS.md.

cargo bench --bench bucket_bench            # better-bucket baselines
cargo bench --features comparison           # + the governor comparison

Feature Flags

Feature	Default	Description
`std`	✅	Standard library. Off → `no_std`.
`clock`	✅	Pluggable `clock-lib` time source: monotonic clock + mockable clock for tests. Implies `std` (clock-lib's `Clock` is std-gated).

# no_std build (no clock-lib):
better-bucket = { version = "0.6", default-features = false }

The lock-free accounting core uses only core atomics and is no_std-capable in principle, but the shipped Bucket constructors read time from clock-lib and therefore require the default clock feature (which implies std). A bare no_std build currently exposes only the crate's VERSION; a caller-driven, clock-free time API is a candidate for a future release.

Cross-Platform Support

Tier 1 Support:

✅ Linux (x86_64, aarch64)
✅ macOS (x86_64, Apple Silicon)
✅ Windows (x86_64)

Behavior is identical across all three; the CI matrix runs every target on stable and MSRV. A commit that breaks any platform is a broken commit.

Testing

# Unit + integration + property tests
cargo test --all-features

# Concurrency model checking (no over-grant under interleaving)
RUSTFLAGS="--cfg loom" cargo test --test loom_acquire

# Benchmarks
cargo bench --bench bucket_bench

# Format + lints (must be clean)
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings

Where It Fits

better-bucket is the single-purpose home for token-bucket math in the wider library ecosystem. It is consumed by rate-net — a multi-algorithm, per-key rate limiter — which uses this crate as its token-bucket strategy rather than reimplementing the algorithm. better-bucket stays foreign-compatible: it works perfectly well on its own, with no obligation to pull in the rest of the family.

Contributing

Contributions are welcome. Before opening a PR, make sure cargo fmt, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features are all clean, and that any change touching the acquire path is accompanied by a benchmark and (where it affects concurrency) a loom test.