muxer
Deterministic, multi-objective routing primitives for "provider selection" problems.
What problem this solves
You have a small set of arms (model versions, inference endpoints, backends, data sources — anything you choose between repeatedly) and calls where you evaluate quality after the fact. After each call you label the result: did it succeed? was the output good enough? was it completely broken? You define those thresholds; muxer tracks the rates and routes future calls accordingly.
Naive approaches fall short in predictable ways: always-best-arm starves new and recovering arms so regressions go undetected; round-robin wastes calls on arms you know are degraded; cooldown-on-failure misses slow quality drift that never triggers a hard error. You want an online policy that:
- explores new or recently-changed arms
- avoids regressions (routes away from arms with rising failure or quality-degradation rates)
- is deterministic by default (same stats/config → same choice), so it's easy to debug
What it is
An Outcome has three caller-defined quality fields plus cost and latency:
ok: the call produced a usable resultjunk: quality was below your threshold — low F1, empty extraction, low-confidence score. Also set whenhard_junk=true(hard failure is a subset of junk, tracked and penalized separately)hard_junk: the call failed entirely (error, timeout, parse failure) — impliesjunk=truecost_units: caller-defined cost proxy (token count, API credits, examples processed, etc.)elapsed_ms: wall-clock time
The framework is designed for small arm counts (typically 2–10) and moderate window sizes (hundreds to low thousands of observations). The core selection idea is:
- maintain a small sliding window of recent
Outcomes per arm - compute a Pareto frontier over ok rate, junk rate, cost, and latency
- pick deterministically via scalarization + stable tie-break
This crate also includes:
- a seedable Thompson-sampling policy (
ThompsonSampling) for cases where you can provide a scalar reward in[0, 1]per call - a seedable EXP3-IX policy (
Exp3Ix) for more adversarial / fast-shifting reward settings - (feature
contextual) a linear contextual bandit policy (LinUcb) for per-request routing with feature vectors - latency guardrails (
LatencyGuardrailConfig/apply_latency_guardrail) — hard pre-filter by mean latency, with stop-early semantics for multi-pick loops - multi-pick selection (
select_mab_k_guardrailed_explain_fulland variants) — select up tokunique arms per decision with a per-round guardrail loop - maintenance sampling (
CoverageConfig/coverage_pick_under_sampled) — ensure all arms stay measured above a quota; see Three goals for sampling for why this matters - post-detection triage (
WorstFirstConfig/worst_first_pick_k) — prioritize the most degraded arms for investigation after monitoring fires - contextual cell triage (
ContextualCoverageTracker/contextual_worst_first_pick_k) — lift triage to(arm, context-bin)pairs so localised regressions don't average away - combined detect + triage sessions (
TriageSession) — wires per-arm CUSUM detection and per-cell investigation into one stateful session softmax_map— stable score → probability helper for traffic splittingRouter— stateful session that owns all per-arm state and handles the full lifecycle (select/observe/acknowledge_change); supports dynamic arm add/remove and large K- threshold calibration (
calibrate_cusum_threshold) — Monte Carlo calibration: "what CUSUM threshold givesP[alarm within m rounds] ≤ α?" - window size guidance (
suggested_window_cap) — SW-UCB–derived:sqrt(throughput / change_rate) - control arms (
ControlConfig/pick_control_arms) — reserve deterministic-random picks as a selection-bias anchor - novelty helpers (
novelty_pick_unseen), prior smoothing (apply_prior_counts_to_summary), and pipeline glue (PipelineOrder/PolicyPlan) for building custom routing harnesses
Which policy should I use?
select_mab(Window + Pareto + scalarization): when you care about multiple objectives at once (success rate, failure rate, quality degradation, cost, latency) and you want deterministic selection with hard constraints.ThompsonSampling: when you can provide a single reward per call (in[0, 1]) and want a classic explore/exploit policy (seedable, optionally decayed).Exp3Ix: when reward is non-stationary / adversarial-ish and you still want a probabilistic policy (seedable, optionally decayed).LinUcb(featurecontextual): when you have a per-request feature vector (e.g. cheap "difficulty" features, embeddings, metadata) and want a contextual policy.
Routing lifecycle
A typical deployment has three modes:
- Normal (
select_mab/ThompsonSampling/Exp3Ix): route to the best arm while exploring. This runs on every call. - Regression investigation (
worst_first_pick_k): after monitoring fires on an arm, route extra traffic there to characterize the change.TriageSessionautomates the detect → investigate handoff. - Control (
pick_random_subset): reserve a small fraction of calls as a random baseline to anchor quality estimates and detect selection bias.
CoverageConfig provides a floor that bridges modes 1 and 3: it ensures no arm is so starved that you'd miss a regression in it.
Three goals for sampling
Every routing decision involves three objectives that generally compete:
- Exploitation — minimize regret; route to the best arm now.
- Estimation — understand each arm's true rates; keep all arms measured.
- Detection — notice when an arm changes; minimize delay between the change and the alarm.
The two clocks. You only observe an arm when you sample it, so there are two notions of time:
- Wall time $t$: global decision steps.
- Sample time $n_k$: observations from arm $k$.
Detection delay in wall time scales as delay_wall ≈ delay_samples / rate_k. CoverageConfig sets a minimum sampling-rate floor, which is the direct lever for bounding wall-clock detection delay.
The non-contextual collapse. In the non-contextual case with a static allocation, estimation error and average detection delay are both $O(1/n_k)$ in the per-arm sample count. They are structurally proportional — the same lever (how often you sample an arm) drives both. This means there is no free-lunch between estimation and average detection: the Pareto surface collapses to a 1-D curve parameterized by $n_k$.
The contextual revival. In the contextual regime (LinUcb), routing also depends on a per-request feature vector. Average detection delay remains proportional to estimation (they share the same design-measure sensitivity). But worst-case detection delay — which concentrates on the covariate cell with the fewest observations — is genuinely independent. This is why ContextualCoverageTracker and TriageSession exist: localised regressions in sparse covariate regions need explicit coverage and cell-level triage, not just arm-level monitoring.
For the full treatment and concrete failure modes, see the API docs and examples/EXPERIMENTS.md.
Unified decision records (recommended for logging/replay)
Most production routers want a single "decision object" shape regardless of policy so logging, auditing, and replay don't depend on per-policy conventions. muxer provides a unified Decision envelope with:
chosen: the arm nameprobs: optional probability distribution (when a policy has one)notes: typed audit notes (explore-first, constraint gating, numerical fallback, etc.)
Each policy has a *_decide / decide_* method that returns this.
Quick examples
Deterministic multi-objective selection (Pareto + scalarization)
use ;
use BTreeMap;
let arms = vec!;
let mut summaries = new;
// arm "a": 9/10 ok, 0 junk (quality above threshold)
summaries.insert;
// arm "b": same ok rate, but 2 results fell below quality threshold
summaries.insert;
let sel = select_mab;
assert_eq!; // lower junk rate wins when all else is equal
Online routing loop (Window ingestion)
You maintain a Window per arm, push Outcomes as requests finish, and call select_mab on each decision.
Note: this example simulates an environment and therefore requires --features stochastic if you disabled default features.
Monitored selection (baseline vs recent drift + uncertainty-aware rates)
If you maintain a baseline and recent window per arm for change monitoring, use MonitoredWindow
plus select_mab_monitored_*:
End-to-end router demo (Window + constraints + stickiness + delayed junk)
This combines multiple production patterns in one loop: window ingestion, constraints+weights, stickiness reasons, and delayed junk labeling.
This same scenario has a CI-checked regression test in tests/e2e_metrics.rs and logs whether constraint fallback was used.
Window ingestion with delayed quality labeling
In most real-world routing, quality is known only after processing: you call the arm, receive a response, then score it (compute F1, run a parser, check embedding similarity) and label it junk if it falls below your threshold. The pattern is: push the Outcome immediately with junk: false, then call set_last_junk_level once scoring completes.
Constraint + trade-off tuning for select_mab
Example showing "constraints first, then weights":
Router — production pattern (monitoring + triage + calibration + snapshot)
Shows: CUSUM threshold calibration, full RouterConfig with monitoring/triage/coverage/control, regression detection, acknowledgment, and snapshot/restore for persistence across restarts.
Router — full lifecycle (select / observe / triage / acknowledge)
This covers: basic two-arm routing, quality divergence, triage detection, acknowledgment, large-K batch exploration (K=20 in 7 rounds with k=3), and dynamic arm management. No --features flag needed.
Multi-pick selection with a latency guardrail
select_mab_k_guardrailed_explain_full selects up to k unique arms per decision, applying
a LatencyGuardrailConfig each round. When combined with MonitoredWindows, use
select_mab_k_guardrailed_monitored_explain_full. log_mab_k_rounds_typed converts the
explanation into compact, log-ready round rows.
Guardrail semantics (soft pre-filter vs hard constraint) are shown in:
Unified decision records
Detect-then-triage (TriageSession)
TriageSession wires per-arm CUSUM detection with per-(arm, context-bin) cell investigation:
use ;
let arms = vec!;
let mut session = new.unwrap;
// Feed observations: arm name, outcome category, feature context.
session.observe;
session.observe;
// Which arms have CUSUM-alarmed?
let alarmed = session.alarmed_arms;
// Top (arm, bin) cells to route extra investigation traffic to.
let bins = session.tracker.active_bins;
let cells = session.top_alarmed_cells;
OutcomeIdx::from_outcome(ok, junk, hard_junk) maps a muxer::Outcome triple to the 4-category index space (OK / SOFT_JUNK / HARD_JUNK / FAIL). For the non-contextual case, worst_first_pick_k provides arm-level triage without feature vectors.
EXP3-IX (adversarial bandit) with probabilities
use ;
let arms = vec!;
let mut ex = new;
let d = ex.decide.unwrap;
// ... run request with `d.chosen` ...
ex.update_reward; // reward in [0, 1]
let probs = d.probs.unwrap;
let s: f64 = probs.values.sum;
assert!;
Note: this example requires --features stochastic if you disabled default features.
Thompson "traffic splitting" selector (mean-softmax allocation)
use ;
let arms = vec!;
let mut ts = with_seed;
let d = ts.decide_softmax_mean.unwrap;
ts.update_reward;
let alloc = d.probs.unwrap;
let s: f64 = alloc.values.sum;
assert!;
Note: this example requires --features stochastic if you disabled default features.
Contextual routing (LinUCB)
Notes:
- If you want a probability distribution over arms for this context (e.g. for traffic-splitting or logging approximate propensities), use
LinUcb::probabilities(...)orLinUcb::decide_softmax_ucb(...). - Algorithm reference: LinUCB (Chu et al., "Contextual bandits with linear payoff functions").
Contextual "propensity logging" example:
Stickiness / switching-cost control
If you want to reduce "flapping" between arms, wrap deterministic selection with StickyMab:
Mini-experiments (bandits × monitoring × false alarms)
Runnable probes that make tradeoffs and failure modes explicit — see examples/EXPERIMENTS.md for guided walkthroughs of each:
cargo run --example guardrail_semanticscargo run --example coverage_autotune --features stochasticcargo run --example free_lunch_investigation --features stochasticcargo run --example detector_inertia --features stochasticcargo run --example detector_calibration --features stochasticcargo run --example bqcd_sampling --features stochasticcargo run --release --example bqcd_calibrated --features stochastic
Reusable bits extracted from these experiments live in muxer::monitor, notably:
CusumCatBank: "GLR-lite" robustification via a small bank of CUSUM alternatives.calibrate_threshold_from_max_scores: threshold calibration from null max-score samples (supports Wilson-conservative mode).
Documentation
Quickstart (Router)
The Router struct owns all per-arm state and handles the full lifecycle in three calls:
use ;
let arms = vec!;
let mut router = new.unwrap;
loop
For larger arm counts, pass k > 1 to batch the initial exploration:
// K=30 arms, k=3 per round → initial coverage in ~10 rounds.
let cfg = default.with_coverage;
let d = router.select;
Usage
[]
= "0.3.2"
If you only want the deterministic Window + select_mab* core (no stochastic bandits), disable default features:
[]
= { = "0.3.2", = false }
Development
# If you are in a larger Cargo workspace, scope to this package:
# Microbenches (criterion):
# (Optional) Match CI checks: