Crate muxer

Expand description

muxer: deterministic, multi-objective routing primitives for piecewise-stationary multi-armed bandit problems.

Given K arms (model versions, inference endpoints, backends, or any discrete action set selected repeatedly), the agent observes vector-valued outcomes per call and selects the next arm. Reward distributions are piecewise-stationary: they may change at unknown times, and the agent must detect and adapt to these changes.

An Outcome carries four caller-defined quality fields:

ok: the call produced a usable result.
junk: quality was below your threshold (hard_junk=true implies junk=true).
hard_junk: the call failed entirely — a harsher subset of junk, penalized separately.
quality_score: Option<f64>: optional continuous quality signal [0,1] (higher = better). Supplements the binary flags without changing routing semantics.

Plus cost_units (caller-defined cost proxy) and elapsed_ms (wall-clock time).

Goals:

Deterministic by default: same stats + config → same choice.
Non-stationarity friendly: sliding-window summaries, not lifetime averages.
Multi-objective: Pareto frontier over configurable Objective dimensions, then scalarization. default_objectives provides ok/junk/cost/latency/quality.
Small K: designed for 2–10 arms; not intended for K in the hundreds.

Selection policies:

select_mab / select_mab_explain / select_mab_monitored_explain: deterministic Pareto + scalarization over MabConfig::objectives. Each Objective defines an extraction, direction, and scalarization weight.
ThompsonSampling: seedable Thompson sampling for scalar rewards in [0, 1].
Exp3Ix: seedable EXP3-IX for adversarial / fast-shifting rewards.
BanditPolicy (feature stochastic): common decide/update_reward trait for ThompsonSampling and Exp3Ix — enables generic policy code.
softmax_map: stable score → probability helper for traffic splitting.
(feature contextual) LinUcb: linear contextual bandit.

Operational primitives:

TriageSession: detect-then-triage — per-arm CUSUM detection wired to per-cell (arm × context-bin) investigation.
WorstFirstConfig / worst_first_pick_k: post-detection investigation routing.
CoverageConfig / coverage_pick_under_sampled: maintenance sampling floor.
LatencyGuardrailConfig: hard pre-filter by mean latency.
PipelineOrder / policy_plan_generic / policy_fill_generic: harness glue.

Non-goals:

Not a full bandit platform (no storage, OPE pipelines, dashboards).
contextual is intentionally a small, pragmatic policy module.

§The Three Objectives and the Objective Manifold

Every routing decision simultaneously serves three purposes:

Exploitation (regret minimization): route to the best arm now.
Estimation (learning): understand how each arm performs across conditions.
Detection/triage (monitoring): notice when an arm breaks, then investigate.

These map to the modules of this crate:

Selection (select_mab, Exp3Ix, ThompsonSampling): objectives 1+2.
Monitoring (monitor): objective 3 – drift/catKL/CUSUM are the detection surface of the design measure.
Triage (worst_first): active investigation after detection fires – prioritize historically broken arms to characterize the change.

§The non-contextual collapse (static schedules)

For fixed (non-adaptive) allocation schedules with K arms and Gaussian rewards, estimation error (MSE ~ 1/n) and average detection delay (D_avg ~ C*T / (n * delta^2)) are both monotone-decreasing in the suboptimal-arm pull count n, with proportional gradients everywhere:

  D_avg = (2 * C * sigma^2 * ln(1/alpha) / delta^2) * MSE

This proportionality is structural, not approximate: both functionals care only about “how many observations at this cell”, and their sensitivity functions are scalar multiples of each other. The three-way Pareto surface collapses to a one-dimensional curve parameterized by n. This yields the product identity R_T * D_avg = C * Delta * T / delta^2 for the restricted class of uniform schedules.

Categorical note. The formula above uses Gaussian notation (sigma^2, mean shift delta). muxer operates on categorical outcomes (ok/junk/hard_junk), where detection delay scales as h / KL(p1 || p0) in sample time rather than 2*b*sigma^2 / delta^2. The proportionality structure — MSE and average detection delay both O(1/n_k) — holds identically; only the constants change. See pare::sensitivity for the general form.

Caveat: adaptive policies. This clean proportionality holds exactly only for static (non-adaptive) schedules. Adaptive policies (UCB, Thompson Sampling) break it in at least two ways: (1) they allocate the suboptimal arm in bursts during exploration phases rather than uniformly, worsening worst-case detection delay relative to uniform spacing without changing total n; and (2) data-dependent allocation introduces selection bias, so the effective sample size for estimation is no longer simply n. For adaptive policies, the product identity becomes a lower bound (with constants that absorb regularity conditions), not an equality.

Practically, muxer operates at small K (2-10 arms) and moderate T (hundreds to low thousands of windowed observations). At these scales, the asymptotic impossibility results may not bind: all three objectives can often be simultaneously satisfied at acceptable levels. The sliding-window design further blunts the static/adaptive distinction, since the effective horizon is the window size, not T.

§The contextual revival – and its subtlety

Note: this section synthesizes standard results from experimental design theory and quickest change detection into a combined framework. The individual components (D-optimal sensitivity, minimax detection delay, regret sensitivity) are established; the three-way independence argument and average-vs-worst-case mechanism below are original to this crate, not a result from the cited papers.

In the contextual regime (per-request feature vectors via LinUcb), the design measure gains spatial dimensions, but objectives do not automatically diverge. The mechanism controlling collapse vs. revival is average-case vs. worst-case aggregation, not “contextual vs. non-contextual” per se:

Average detection delay has sensitivity s(x) ~ -1/p_a(x)^2, which is proportional to nonparametric IMSE sensitivity everywhere. Average detection is structurally redundant with estimation even in contextual settings. Adding contexts does not break this proportionality.
Worst-case detection delay (D_max = max_j D_j) concentrates its sensitivity on the bottleneck cell – the (arm, covariate) pair with the fewest observations. This is a point mass, linearly independent from both the regret sensitivity (a ramp near decision boundaries) and the estimation sensitivity (D-optimal / extremal). Worst-case detection is genuinely independent from regret and estimation.

In the non-contextual case (one cell), average and worst-case are identical, so the distinction is moot. In the contextual case (many cells), they diverge: average detection remains redundant with estimation, but worst-case detection introduces a genuinely new objective axis.

Concretely, each objective wants a different sampling distribution:

Regret-optimal: concentrate near decision boundaries.
Estimation-optimal: spread to extremes (D-optimal experimental design).
Detection-optimal (worst-case): ensure no cell is starved (space-filling).

LinUcb exists to break the non-contextual collapse: it learns per-request routing without maintaining separate per-facet histories, at the cost of requiring explicit monitoring budget beyond what regret-optimal sampling provides.

§Saturation principle

The number of genuinely independent objectives is bounded by the effective dimension of the design space (Ehrgott & Nickel 2002, via Helly’s Theorem):

  dim(Pareto front) <= min(m - 1, D_eff)

where m is the number of named objectives and D_eff is the design dimension (K-1 for non-contextual, ~K*M for M covariate cells, infinite for continuous covariates). Adding objectives beyond D_eff + 1 cannot create new tradeoffs in the Pareto sense (though their weights still affect scalarization tiebreaking).

The formal algebraic rank can overstate the practical number of tradeoffs. The effective Pareto dimension is better measured by the singular value spectrum of the Jacobian of objectives with respect to design variables: a K=3, M=9 setup with 8 named objectives achieves formal rank 8 but effective dimension ~3-4 (the top 2 singular values carry >99% of the Frobenius norm). See pare::sensitivity for computational tools.

§Design measure perspective

The fundamental object is not “which objectives matter” but the design measure: the joint distribution over (arm, covariate, time) that the policy induces. Given the design measure, every objective is computable.

Note: the design measure in an adaptive setting is a random object (it depends on the realized trajectory), not a fixed distribution. Reasoning about it requires either working with the expected design measure (which loses adaptivity) or a conditional analysis that respects the filtration (which is harder and may break clean proportionality results). See Hadad et al. (arXiv:1911.02768) for the observed vs. expected Fisher information distinction in adaptive experiments.

Non-stationary bandits and sliding windows. Garivier & Moulines (2008, arXiv:0805.3415) established the Sliding-Window UCB (SW-UCB) algorithm, which achieves O(sqrt(Υ_T * K * T * log T)) regret for piecewise-stationary environments with Υ_T changepoints. This is the theoretical foundation for muxer’s sliding-window approach. Optimal window size is O(sqrt(T / Υ_T)); in practice muxer uses a fixed caller-chosen cap.

Non-stationary bandits with explicit detection. Besson, Kaufmann, Maillard & Seznec (2019, arXiv:1902.01575, JMLR 2023) introduced GLR-klUCB: a parameter-free algorithm combining kl-UCB with a Bernoulli Generalized Likelihood Ratio Test. It is the closest published analog to muxer’s monitored selection path. Key difference: GLR-klUCB restarts all arm statistics on detection; muxer instead switches to worst-first routing to investigate the flagged arm, preserving history. See TriageSession and WorstFirstConfig.

Bandit Quickest Changepoint Detection (BQCD). Gopalan, Saligrama & Lakshminarayanan (2021, arXiv:2107.10492) established the BQCD lower bound: any algorithm with mean-time-to-false-alarm m must suffer expected detection delay Ω(log(m) / D*), where D* is the maximum KL divergence between post-change and pre-change distributions across arms. Their ε-GCD algorithm achieves this with exploration rate ε = Θ(1/√T). CoverageConfig’s minimum sampling-rate floor is the practical lever for approaching this bound.

Sampling-constrained detection. Zhang & Mei (2020, arXiv:2009.11891) directly analyze changepoint detection under sampling-rate constraints, confirming that detection delay in wall time scales as h / (KL(p1 || p0) * rate_k) — the formal basis for the two-clocks approximation.

Observation-cost vs detection delay. Banerjee & Veeravalli (2012, arXiv:1211.3729) formalize the minimax tradeoff between observation cost and detection delay. The product identity R_T * D_avg = const in the non-contextual collapse is a special case of their framework, where “observation cost” manifests as regret from pulling suboptimal arms.

Regret–BAI Pareto frontier. Zhong, Cheung & Tan (2021, arXiv:2110.08627) formally prove the Pareto tradeoff between regret minimization (RM) and best-arm identification (BAI): achieving O(log T) regret and O(log T) BAI error simultaneously is impossible. The product-identity formulation in muxer’s docs is the static-schedule special case.

Piecewise-stationary multi-objective bandits. Rezaei Balef & Maghsudi (2023, arXiv:2302.05257) study the combination of non-stationarity and multi-objective rewards directly. muxer operates in this space — without claiming worst-case-optimal bounds — by combining sliding-window summaries with Pareto scalarization and explicit monitoring.

Window-limited CUSUM. Xie, Moustakides & Xie (2022, arXiv:2206.06777) connect windowed observation to CUSUM optimality, providing theoretical grounding for MonitoredWindow’s split between baseline and recent windows.

Multi-objective bandit frameworks.

Constrained / safe bandits (BwK): muxer’s max_junk_rate, max_drift, etc. are BwK-style anytime constraints. (Badanidiyuru, Kleinberg & Slivkins 2013, FOCS; arXiv:1305.2545)
Pareto bandits (Drugan & Nowé, IJCNN 2013): muxer’s pare-based frontier is the selection-time analogue.
Information-Directed Sampling (Russo & Van Roy 2014, arXiv:1403.5556): scalarizes regret/information via the information ratio. The three-objective extension is non-trivial only in the contextual worst-case-detection regime.
Adaptive experiment design (Hadad et al. 2021, arXiv:1911.02768): the observed vs. expected Fisher information distinction applies to muxer’s adaptive design-measure analysis.

Pareto dimension and objective redundancy.

Ehrgott & Nickel (2002), “On the number of criteria needed to decide Pareto optimality” (Math. Meth. Oper. Res., 55:329–345): proves via Helly’s Theorem that Pareto optimality in D design variables can be decided by at most D+1 objectives. The saturation principle above is a direct application.
Objective reduction (Deb & Saxena 2005): identifies redundant objectives in many-objective optimization. Relevant when callers configure more objectives than the design space supports.
Pareto front topology (Kobayashi et al. 2018, arXiv:1812.05222): confirms empirically that M-objective problems have (M-1)-dimensional fronts, and provides Bézier simplex fitting for their structure.

Re-exports§

pub use monitor::calibrate_cusum_threshold;
pub use monitor::simulate_cusum_null_max_scores;
pub use monitor::calibrate_threshold_from_max_scores;
pub use monitor::DriftConfig;
pub use monitor::DriftMetric;
pub use monitor::MonitoredWindow;
pub use monitor::RateBoundMode;
pub use monitor::ThresholdCalibration;
pub use monitor::UncertaintyConfig;

Modules§

monitor: Monitoring primitives: drift detection + uncertainty summaries.

Structs§

BetaStats: Beta posterior state for one arm.
CandidateDebug: Debug row for one candidate arm.
CatKlGuardDecision: Output of applying a categorical KL guard to a candidate set.
ContextBinConfig: Configuration for quantising a feature vector into a discrete context bin.
ContextualCell: An (arm, context-bin) pair identifying a specific covariate cell for triage.
ContextualCoverageTracker: Per-(arm, context-bin) observation tracker.
ContextualPolicyFill: Result of a contextual policy fill (LinUCB-based selection with context features).
ControlConfig: Configuration for control arm reservation.
CoverageConfig: Coverage / maintenance sampling configuration.
CusumGuardDecision: Output of applying a categorical CUSUM guard to a candidate set.
Decision: A single policy decision in a unified envelope.
DriftGuardDecision: Output of applying a drift guard to a candidate set.
Exp3Ix: Seedable EXP3-IX bandit.
Exp3IxConfig: Configuration for EXP3-IX.
Exp3IxState: Serializable EXP3-IX state snapshot (for persistence).
LatencyGuardrailConfig: Latency guardrail configuration.
LinUcb: Seedable linear contextual bandit (LinUCB).
LinUcbArmState: Serializable per-arm state for LinUCB persistence.
LinUcbConfig: Configuration for linear UCB.
LinUcbState: Serializable LinUCB state snapshot (for persistence across process restarts).
MabConfig: Configuration knobs for deterministic MAB-style selection.
MabSelectionDecision: Additional metadata for a deterministic select_mab decision.
MonitoredMabConfig: Extended configuration for monitored MAB selection.
Objective: A single objective dimension for Pareto selection.
ObjectiveValue: A resolved objective value for one candidate arm.
Outcome: A single observed outcome for an arm.
OutcomeIdx: Categorical outcome index constants for the 4-category muxer outcome space.
PolicyFill: Result of filling up to k picks using the shared policy pipeline.
PolicyPlan: Planned policy result for a single selection step.
Router: Stateful routing session.
RouterConfig: Full configuration for a Router.
RouterDecision: Output of a single Router::select call.
RouterSnapshot: A serializable snapshot of Router state.
Selection: Output of select_mab (chosen arm + debugging context).
StickyConfig: Stickiness configuration.
StickyMab: Stateful stickiness wrapper for deterministic select_mab.
Summary: Aggregate counts/sums over a window of outcomes.
ThompsonConfig: Configuration for Thompson sampling.
ThompsonSampling: Seedable Thompson-sampling bandit.
ThompsonState: Serializable Thompson-sampling state snapshot (for persistence across process restarts).
TriageSession: Combined detect-then-triage session.
TriageSessionConfig: Configuration for a TriageSession.
Window: Sliding-window statistics for an arm.
WorstFirstConfig: Configuration for worst-first scoring.

Enums§

DecisionNote: Audit-friendly notes attached to a decision.
DecisionPolicy: Which policy produced a decision.
Extract: How to extract an objective value from a Summary.
PipelineOrder: Pipeline ordering: whether novelty/coverage pre-picks run before or after the guardrail.
RouterMode: Current operating mode of the Router.

Traits§

BanditPolicy: Common interface for stateful stochastic bandit policies.

Functions§

apply_prior_counts_to_summary: Apply a pseudo-count prior to a Summary until it reaches target_calls.
context_bin: Compute a stable context bin ID from a feature vector.
contextual_worst_first_pick_k: Pick up to k (arm, context-bin) cells for contextual worst-first triage (without replacement).
contextual_worst_first_pick_one: Pick one (arm, context-bin) cell for contextual worst-first regression triage.
coverage_pick_under_sampled: Deterministically pick up to k arms that are under-sampled relative to a target quota.
coverage_pick_under_sampled_idx: Deterministically pick up to k arms (by index) that are under-sampled relative to a target quota.
default_objectives: The default objective set for deterministic MAB selection.
guardrail_filter_observed: Apply the latency guardrail using observed (non-smoothed) stats.
guardrail_filter_observed_elapsed: Like guardrail_filter_observed, but the observation function returns the raw elapsed-ms sum.
novelty_pick_unseen: Deterministically pick up to k arms that are “unseen” under the provided observed_calls.
pick_control_arms: Pick up to cfg.control_k arms uniformly at random (deterministic, seed-based).
policy_fill_generic: Generic policy fill: compute a plan via policy_plan_generic, then fill remaining picks using pick_rest.
policy_fill_k_contextual: Contextual variant of policy_fill_k_observed_with: uses LinUCB for the algorithmic selection step instead of a flat MAB/EXP3-IX policy.
policy_fill_k_observed_guardrail_first_with_coverage: Delegates to policy_fill_generic with PipelineOrder::GuardrailFirst and caller-provided coverage.
policy_fill_k_observed_with_coverage: Delegates to policy_fill_generic with PipelineOrder::NoveltyFirst and caller-provided coverage.
policy_plan_generic: Generic policy plan: compute pre-picks (novelty + optional coverage) and guardrail filtering, with ordering controlled by order.
select_k_without_replacement_by: Convenience wrapper over select_k_without_replacement_by_with_meta when no meta is needed.
select_mab: Deterministic selection:
select_mab_decide: Unified decision envelope for deterministic MAB selection.
select_mab_explain: Like select_mab, but also returns metadata about constraints and explore-first behavior.
select_mab_monitored_decide: Unified decision envelope for monitored deterministic MAB selection.
select_mab_monitored_explain: Monitored deterministic selection (baseline vs recent drift + uncertainty-aware rates).
select_mab_monitored_explain_with_summaries: Like select_mab_monitored_explain, but uses caller-provided summaries for the base objectives (e.g. prior-smoothed rates), while still computing monitoring scores from monitored.
softmax_map: Compute a stable softmax distribution over a map of scores.
split_control_budget: Split a k-pick budget into control and MAB allocations.
stable_hash64: Deterministic (non-crypto) stable hash used for “random” sampling / tie-breaking.
suggested_window_cap: Suggest a Window capacity based on expected throughput and changepoint rate.
worst_first_pick_k: Pick up to k arms for worst-first regression hunting (without replacement).
worst_first_pick_one: Pick one arm for “worst-first” regression hunting.

Type Aliases§

LinUcbScore: Per-arm score tuple: (ucb, mean, bonus).

Crate muxer

Crate muxer Copy item path

§The Three Objectives and the Objective Manifold

§The non-contextual collapse (static schedules)

§The contextual revival – and its subtlety

§Saturation principle

§Design measure perspective

§Related work

Re-exports§

Modules§

Structs§

Enums§

Traits§

Functions§

Type Aliases§

Crate muxer