adaptive-timeout
Adaptive timeout computation based on observed latency percentiles.
This crate provides a mechanism for computing request timeouts that automatically adapt to observed network conditions. The approach is deeply inspired by the adaptive timeout logic in Facebook's LogDevice, generalized into a reusable, domain-agnostic Rust library.
The problem
Fixed timeouts are fragile. Set them too low and you get false positives during transient slowdowns; set them too high and you waste time waiting for genuinely failed requests. Exponential backoff helps with retries but has no awareness of actual network conditions.
How it works
-
A
LatencyTrackerrecords round-trip times for requests, maintaining per-destination sliding-window histograms of recent latency samples. The tracker is generic over destination and message key types, so it works with any transport or RPC system. -
An
AdaptiveTimeoutqueries the tracker for a high percentile (e.g. P99.99) of recent latencies, applies a configurable safety factor, an exponential backoff multiplier based on the attempt number, and clamps the result between a floor and ceiling. -
When insufficient data is available (cold start or sparse traffic), the system falls back gracefully to pure exponential backoff.
Timeout selection algorithm
For each destination in a request's target set:
timeout = clamp(safety_factor * percentile_estimate * 2^(attempt-1), min, max)
The final timeout is the maximum across all destinations, ensuring it is long enough for the slowest expected peer.
Quick start
use ;
use ;
let now = now;
// Create a tracker and timeout selector with default configs.
let mut tracker = default;
let timeout = default;
// Initially there's no data -- we get exponential backoff from min_timeout.
let t = timeout.select_timeout;
assert_eq!;
// Record some latency observations (e.g. from real RPCs).
for i in 0..100u64
// Now the timeout adapts based on observed latencies.
let t = timeout.select_timeout;
assert!;
Custom clocks
All time-dependent types and methods are generic over the Instant trait. You
can supply your own implementation for simulated time, async runtimes, or other
custom clocks:
use Duration;
use Instant;
; // nanoseconds
// Use it with LatencyTracker:
let mut tracker = default;
tracker.record_latency_ms;
When using std::time::Instant (the default), you don't need to specify the
third type parameter at all.
Architecture
src/
lib.rs Public re-exports, crate-level docs
clock.rs Instant trait (abstracts over time sources)
config.rs TrackerConfig, TimeoutConfig (compact, Copy types)
histogram.rs SlidingWindowHistogram (time-bucketed ring of HdrHistograms)
tracker.rs LatencyTracker<D, M> (per-destination latency tracking)
timeout.rs AdaptiveTimeout (percentile-based timeout selection)
Key design decisions
| Aspect | Choice | Rationale |
|---|---|---|
| Histogram backend | hdrhistogram crate |
Proven, widely used, handles wide dynamic ranges natively without log-space transforms |
| Sliding window | Ring of N sub-window histograms with incremental merge | Avoids rebuilding a merged histogram on every percentile query; rotation subtracts expired buckets |
| Duration representation | NonZeroU32 milliseconds in config structs |
4 bytes vs 16 for Duration; TimeoutConfig fits in 24 bytes; hot-path arithmetic stays in integer domain |
| In-flight tracking | HashMap with monotonic sequence counter |
O(1) for record_reply (the common path); eviction scan only runs when at capacity (rare) |
| Thread safety | Single-threaded (Send but not Sync) |
No synchronization overhead; caller wraps in Mutex/RefCell if sharing is needed |
| Time abstraction | Instant trait (clock::Instant), impl'd for std::time::Instant |
Pluggable clocks for simulated time, async runtimes, etc. |
| Time injection | All methods accept an Instant parameter |
Deterministic tests without mocking; zero overhead in production |
| Generics | LatencyTracker<D, M, I> over destination, message key, and instant types |
Works with any transport layer and clock without coupling |
Configuration
TrackerConfig (defaults)
| Field | Default | Description |
|---|---|---|
window_ms |
10,000 (10s) | Sliding window duration |
num_sub_windows |
10 | Granularity of window expiry |
min_samples |
30 | Minimum samples before estimates are trusted |
max_in_flight |
10,000 | Bounded in-flight request tracking |
significant_value_digits |
2 | HdrHistogram precision (~1%) |
max_trackable_latency_ms |
60,000 (60s) | Upper clamp for recorded latencies |
TimeoutConfig (defaults)
| Field | Default | Description |
|---|---|---|
min_timeout_ms |
10ms | Floor -- timeout never goes below this |
max_timeout_ms |
60,000ms | Ceiling -- timeout never exceeds this |
percentile |
99.99 | Percentile of the latency distribution to use |
safety_factor |
2.0 | Multiplier on the percentile estimate |
Benchmarks
Run with cargo bench:
record_latency_ms ~80 ns/op (steady state, no rotation)
send_reply_cycle ~140 ns/op (record_send + record_reply pair)
percentile_query ~30-73 ns/op (scales with histogram density)
select_timeout (1 dest) ~83 ns/op
select_timeout (10 dests) ~770 ns/op
exponential_backoff_only ~1.5 ns/op (no tracker interaction)
window_rotation ~2.8 us/op (1 sub-window rotation + record)
in_flight_eviction (10k) ~10 us/op (worst case, at capacity)
Minimum Supported Rust Version (MSRV)
Requires Rust 1.92.0 or later.
License
MIT