tower-acc

Adaptive concurrency control for Tower services.

tower-acc dynamically adjusts the number of in-flight requests a service is allowed to handle, based on observed latency. Instead of picking a fixed concurrency limit and hoping it's right, it continuously measures round-trip times and converges on the optimal limit automatically — increasing it when latency is low and decreasing it when queuing is detected.

Why not a static limit?

Tower ships with ConcurrencyLimit, which caps concurrency at a value you choose at startup. That works when the capacity of the downstream service is known and stable, but in practice:

Backends scale up and down.
Dependency latency varies with load.
The "right" limit depends on conditions you can't predict at deploy time.

Setting the limit too low wastes capacity; setting it too high causes queuing, tail-latency spikes, and cascading failures under load. tower-acc removes the guesswork by adapting the limit at runtime.

How it works

The default algorithm is Vegas, inspired by the TCP Vegas congestion control scheme. It works by tracking the minimum observed RTT (the "no-load" baseline) and comparing each request's RTT against it:

Estimate queue depth — limit × (1 − rtt_noload / rtt).
If the queue is short (below alpha) — increase the limit.
If the queue is long (above beta) — decrease the limit.
On errors — decrease immediately.
Periodically probe — reset the baseline to track changing conditions.

All thresholds are configurable via VegasBuilder.

Usage

As a Tower layer

use tower::ServiceBuilder;
use tower_acc::{ConcurrencyLimitLayer, Vegas};

let service = ServiceBuilder::new()
    .layer(ConcurrencyLimitLayer::new(Vegas::default()))
    .service(my_service);

Wrapping a service directly

use tower_acc::{ConcurrencyLimit, Vegas};

let service = ConcurrencyLimit::new(my_service, Vegas::default());

Custom configuration

use tower_acc::{ConcurrencyLimitLayer, Vegas};

let layer = ConcurrencyLimitLayer::new(
    Vegas::builder()
        .initial_limit(20)
        .max_limit(500)
        .smoothing(0.5)
        .build(),
);

Pluggable algorithms

Implement the Algorithm trait to bring your own strategy:

use std::time::Duration;
use tower_acc::Algorithm;

struct MyAlgorithm { /* ... */ }

impl Algorithm for MyAlgorithm {
    fn max_concurrency(&self) -> usize {
        // Return the current concurrency limit.
        todo!()
    }

    fn update(&mut self, rtt: Duration, num_inflight: usize, is_error: bool, is_canceled: bool) {
        // Adjust internal state based on the observed request outcome.
        todo!()
    }
}

Inspiration

This crate is a Rust/Tower port of the ideas from Netflix's concurrency-limits library and the accompanying blog post Performance Under Load. The core insight — applying TCP congestion control theory to request-level concurrency — comes directly from that work.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

tower-acc 0.1.0