muxer
Deterministic, multi-objective routing primitives for “provider selection” problems.
What problem this solves
You have a small set of arms (providers/models/backends) and repeated calls that produce outcomes (success/429/junk), plus cost + latency. You want an online policy that:
- explores new or recently-changed arms
- avoids regressions (junk/429 spikes)
- is deterministic by default (same stats/config → same choice), so it’s easy to debug
What it is
The core idea is:
- maintain a small sliding window of recent outcomes per provider (ok/429/junk, cost, latency)
- compute a Pareto frontier over the objectives
- pick a single provider deterministically via scalarization + stable tie-break
This crate also includes:
- a seedable Thompson-sampling policy (
ThompsonSampling) for cases where you can provide a scalar reward in[0, 1]per call - a seedable EXP3-IX policy (
Exp3Ix) for more adversarial / fast-shifting reward settings - (feature
contextual) a linear contextual bandit policy (LinUcb) for per-request routing with feature vectors
Which policy should I use?
select_mab(Window + Pareto + scalarization): when you care about multiple objectives at once (success, 429, junk, cost, latency) and you want deterministic selection with hard constraints.ThompsonSampling: when you can provide a single reward per call (in[0, 1]) and want a classic explore/exploit policy (seedable, optionally decayed).Exp3Ix: when reward is non-stationary / adversarial-ish and you still want a probabilistic policy (seedable, optionally decayed).LinUcb(featurecontextual): when you have a per-request feature vector (e.g. cheap “difficulty” features, embeddings, metadata) and want a contextual policy.
Unified decision records (recommended for logging/replay)
Most production routers want a single “decision object” shape regardless of policy so logging, auditing, and replay don’t depend on per-policy conventions. muxer provides a unified Decision envelope with:
chosen: the arm nameprobs: optional probability distribution (when a policy has one)notes: typed audit notes (explore-first, constraint gating, numerical fallback, etc.)
Each policy has a *_decide / decide_* method that returns this.
Quick examples
Deterministic multi-objective selection (Pareto + scalarization)
use ;
use BTreeMap;
let arms = vec!;
let mut summaries = new;
summaries.insert;
summaries.insert;
let sel = select_mab;
assert_eq!; // lower junk when all else is equal
Realistic “online routing loop” (Window ingestion)
This is closer to production usage: you maintain a Window per arm, push Outcomes as requests finish, and call select_mab each decision.
Note: this example simulates an environment and therefore requires --features stochastic if you disabled default features.
Monitored selection (baseline vs recent drift + uncertainty-aware rates)
If you maintain a baseline and recent window per arm for change monitoring, use MonitoredWindow
plus select_mab_monitored_*:
End-to-end router demo (Window + constraints + stickiness + delayed junk)
This combines multiple production patterns in one loop: window ingestion, constraints+weights, stickiness reasons, and delayed junk labeling.
Note: this example simulates an environment and therefore requires --features stochastic if you disabled default features.
This same scenario has a CI-checked regression test in tests/e2e_metrics.rs and now logs whether constraint fallback was used.
Window ingestion with delayed junk labeling
If your “junk” classification is only known after downstream parsing/validation, you can update the most recent outcome:
Constraint + trade-off tuning for select_mab
Example showing “constraints first, then weights”:
EXP3-IX (adversarial bandit) with probabilities
use ;
let arms = vec!;
let mut ex = new;
let d = ex.decide.unwrap;
// ... run request with `d.chosen` ...
ex.update_reward; // reward in [0, 1]
let probs = d.probs.unwrap;
let s: f64 = probs.values.sum;
assert!;
Runnable:
Note: this example requires --features stochastic if you disabled default features.
Thompson “traffic splitting” selector (mean-softmax allocation)
use ;
let arms = vec!;
let mut ts = with_seed;
let d = ts.decide_softmax_mean.unwrap;
ts.update_reward;
let alloc = d.probs.unwrap;
let s: f64 = alloc.values.sum;
assert!;
Runnable:
Note: this example requires --features stochastic if you disabled default features.
Contextual routing (LinUCB)
Runnable:
Notes:
- If you want a probability distribution over arms for this context (e.g. for traffic-splitting or logging approximate propensities), use
LinUcb::probabilities(...)orLinUcb::decide_softmax_ucb(...). - Algorithm reference: LinUCB (Chu et al., “Contextual bandits with linear payoff functions”).
Contextual “propensity logging” example:
Stickiness / switching-cost control
If you want to reduce “flapping” between arms, wrap deterministic selection with StickyMab:
Mini-experiments (bandits × monitoring × false alarms)
If you want runnable “research probes” that make tradeoffs/failure modes explicit, see:
muxer/examples/EXPERIMENTS.md- Examples:
cargo run --example guardrail_semanticscargo run --example coverage_autotune --features stochasticcargo run --example free_lunch_investigation --features stochasticcargo run --example detector_inertia --features stochasticcargo run --example detector_calibration --features stochasticcargo run --example bqcd_sampling --features stochasticcargo run --release --example bqcd_calibrated --features stochastic
Reusable bits extracted from these experiments live in muxer::monitor, notably:
CusumCatBank: “GLR-lite” robustification via a small bank of CUSUM alternatives.calibrate_threshold_from_max_scores: threshold calibration from null max-score samples (supports Wilson-conservative mode).
Usage
[]
= "0.1.2"
If you only want the deterministic Window + select_mab* core (no stochastic bandits), disable default features:
[]
= { = "0.1.2", = false }
Development
# If you are in a larger Cargo workspace, scope to this package:
# Microbenches (criterion):
# (Optional) Match CI checks: