pub struct Vegas { /* private fields */ }Expand description
Loss- and delay-based congestion avoidance.
Additive increase, additive decrease. Multiplicative decrease when overload detected.
Estimates queuing delay by comparing the current latency with the minimum observed latency to estimate the number of jobs being queued.
For greater stability consider wrapping with a percentile window sampler. This calculates a percentile (e.g. P90) over a period of time and provides that as a sample. Vegas then compares recent P90 latency with the minimum observed P90. Used this way, Vegas can handle heterogeneous workloads, as long as the percentile latency is fairly stable.
Can fairly distribute concurrency between independent clients as long as there is enough server capacity to handle the requests. That is: as long as the server isn’t overloaded and failing to handle requests as a result.
Inspired by TCP Vegas.
Implementations§
Source§impl Vegas
impl Vegas
pub fn new_with_initial_limit(initial_limit: usize) -> Self
pub fn new(initial_limit: usize, limit_range: RangeInclusive<usize>) -> Self
pub fn with_max_limit(self, max: usize) -> Self
Trait Implementations§
Source§impl LimitAlgorithm for Vegas
impl LimitAlgorithm for Vegas
Source§fn update<'life0, 'async_trait>(
&'life0 self,
sample: Sample,
) -> Pin<Box<dyn Future<Output = usize> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
fn update<'life0, 'async_trait>(
&'life0 self,
sample: Sample,
) -> Pin<Box<dyn Future<Output = usize> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
Vegas algorithm.
Generally applied over a window size of one or two RTTs.
Little’s law: L = λW = concurrency = rate * latency (averages).
The algorithm in terms of rates:
BASE_D = estimated base latency with no queueing
D(w) = observed average latency per job over window w
L(w) = concurrency limit for window w
F(w) = average jobs in flight during window w
L(w) / BASE_D = E = expected rate (no queueing)
F(w) / D(w) = A(w) = actual rate during window w
E - A(w) = DIFF [>= 0]
alpha = low rate threshold: too little queueing
beta = high rate threshold: too much queueing
L(w+1) = L(w) + 1 if DIFF < alpha
- 1 if DIFF > betaOr, using queue size instead of rate:
D(w) - BASE_D = ΔD(w) = extra average latency in window w caused by queueing
A(w) * ΔD(w) = Q(w) = estimated average queue size in window w
alpha = low queueing threshold
beta = high queueing threshold
L(w+1) = L(w) + 1 if Q(w) < alpha
- 1 if Q(w) > beta