muxer 0.5.2 - Docs.rs

//! `muxer`: deterministic, multi-objective routing primitives for
//! piecewise-stationary multi-armed bandit problems.
//!
//! Given `K` arms (model versions, inference endpoints, backends, or any
//! discrete action set selected repeatedly), the agent observes
//! vector-valued outcomes per call and selects the next arm.  Reward
//! distributions are piecewise-stationary: they may change at unknown
//! times, and the agent must detect and adapt to these changes.
//!
//! An [`Outcome`] carries four caller-defined quality fields:
//! - `ok`: the call produced a usable result.
//! - `junk`: quality was below your threshold (`hard_junk=true` implies `junk=true`).
//! - `hard_junk`: the call failed entirely — a harsher subset of junk, penalized separately.
//! - `quality_score: Option<f64>`: optional continuous quality signal `[0,1]` (higher = better).
//!   Supplements the binary flags without changing routing semantics.
//!
//! Plus `cost_units` (caller-defined cost proxy) and `elapsed_ms` (wall-clock time).
//!
//! **Goals:**
//! - **Deterministic by default**: same stats + config → same choice.
//! - **Non-stationarity friendly**: sliding-window summaries, not lifetime averages.
//! - **Multi-objective**: Pareto frontier over configurable [`Objective`] dimensions,
//!   then scalarization.  [`default_objectives`] provides ok/junk/cost/latency/quality.
//! - **Small K**: designed for 2–10 arms; not intended for K in the hundreds.
//!
//! **Selection policies:**
//! - [`select_mab`] / [`select_mab_explain`] / [`select_mab_monitored_explain`]:
//!   deterministic Pareto + scalarization over [`MabConfig::objectives`].
//!   Each [`Objective`] defines an extraction, direction, and scalarization weight.
//! - [`ThompsonSampling`]: seedable Thompson sampling for scalar rewards in `[0, 1]`.
//! - [`Exp3Ix`]: seedable EXP3-IX for adversarial / fast-shifting rewards.
//! - [`BanditPolicy`] (feature `stochastic`): common `decide`/`update_reward` trait
//!   for `ThompsonSampling` and `Exp3Ix` — enables generic policy code.
//! - [`softmax_map`]: stable score → probability helper for traffic splitting.
//! - (feature `contextual`) `LinUcb`: linear contextual bandit.
//!
//! **Operational primitives:**
//! - [`TriageSession`]: detect-then-triage — per-arm CUSUM detection wired to
//!   per-cell (`arm × context-bin`) investigation.
//! - [`WorstFirstConfig`] / [`worst_first_pick_k`]: post-detection investigation routing.
//! - [`CoverageConfig`] / [`coverage_pick_under_sampled`]: maintenance sampling floor.
//! - [`LatencyGuardrailConfig`]: hard pre-filter by mean latency.
//! - [`PipelineOrder`] / [`policy_plan_generic`] / [`policy_fill_generic`]: harness glue.
//!
//! **Non-goals:**
//! - Not a full bandit platform (no storage, OPE pipelines, dashboards).
//! - `contextual` is intentionally a small, pragmatic policy module.
//!
//!
//! # The Three Objectives and the Objective Manifold
//!
//! Every routing decision simultaneously serves three purposes:
//!
//! 1. **Exploitation** (regret minimization): route to the best arm now.
//! 2. **Estimation** (learning): understand how each arm performs across conditions.
//! 3. **Detection/triage** (monitoring): notice when an arm breaks, then investigate.
//!
//! These map to the modules of this crate:
//!
//! - **Selection** (`select_mab`, `Exp3Ix`, `ThompsonSampling`): objectives 1+2.
//! - **Monitoring** (`monitor`): objective 3 -- drift/catKL/CUSUM are the detection
//!   surface of the design measure.
//! - **Triage** (`worst_first`): active investigation after detection fires --
//!   prioritize historically broken arms to characterize the change.
//!
//! ## The non-contextual collapse (static schedules)
//!
//! For **fixed (non-adaptive) allocation schedules** with K arms and Gaussian rewards,
//! estimation error (MSE ~ 1/n) and **average** detection delay
//! (D_avg ~ C*T / (n * delta^2)) are both monotone-decreasing in the suboptimal-arm
//! pull count n, with proportional gradients everywhere:
//!
//! ```text
//!   D_avg = (2 * C * sigma^2 * ln(1/alpha) / delta^2) * MSE
//! ```
//!
//! This proportionality is structural, not approximate: both functionals care only
//! about "how many observations at this cell", and their sensitivity functions are
//! scalar multiples of each other.  The three-way Pareto surface collapses to a
//! **one-dimensional curve** parameterized by n.  This yields the product identity
//! `R_T * D_avg = C * Delta * T / delta^2` for the restricted class of uniform
//! schedules.
//!
//! **Categorical note.** The formula above uses Gaussian notation (`sigma^2`, mean
//! shift `delta`).  `muxer` operates on categorical outcomes (ok/junk/hard_junk),
//! where detection delay scales as `h / KL(p1 || p0)` in sample time rather than
//! `2*b*sigma^2 / delta^2`.  The proportionality structure — MSE and average
//! detection delay both `O(1/n_k)` — holds identically; only the constants change.
//! See [`pare::sensitivity`][pare] for the general form.
//!
//! **Caveat: adaptive policies.** This clean proportionality holds exactly only for
//! static (non-adaptive) schedules.  Adaptive policies (UCB, Thompson Sampling) break
//! it in at least two ways: (1) they allocate the suboptimal arm in bursts during
//! exploration phases rather than uniformly, worsening worst-case detection delay
//! relative to uniform spacing without changing total n; and (2) data-dependent
//! allocation introduces selection bias, so the effective sample size for estimation
//! is no longer simply n.  For adaptive policies, the product identity becomes a
//! **lower bound** (with constants that absorb regularity conditions), not an equality.
//!
//! Practically, `muxer` operates at small K (2-10 arms) and moderate T (hundreds to
//! low thousands of windowed observations).  At these scales, the asymptotic
//! impossibility results may not bind: all three objectives can often be simultaneously
//! satisfied at acceptable levels.  The sliding-window design further blunts the
//! static/adaptive distinction, since the effective horizon is the window size, not T.
//!
//! ## The contextual revival -- and its subtlety
//!
//! *Note: this section synthesizes standard results from experimental design theory
//! and quickest change detection into a combined framework.  The individual components
//! (D-optimal sensitivity, minimax detection delay, regret sensitivity) are established;
//! the three-way independence argument and average-vs-worst-case mechanism below are
//! original to this crate, not a result from the cited papers.*
//!
//! In the **contextual** regime (per-request feature vectors via `LinUcb`), the design
//! measure gains spatial dimensions, but objectives do **not** automatically diverge.
//! The mechanism controlling collapse vs. revival is **average-case vs. worst-case
//! aggregation**, not "contextual vs. non-contextual" per se:
//!
//! - **Average detection delay** has sensitivity `s(x) ~ -1/p_a(x)^2`, which is
//!   proportional to nonparametric IMSE sensitivity everywhere.  Average detection
//!   is **structurally redundant with estimation** even in contextual settings.
//!   Adding contexts does not break this proportionality.
//!
//! - **Worst-case detection delay** (`D_max = max_j D_j`) concentrates its sensitivity
//!   on the **bottleneck cell** -- the (arm, covariate) pair with the fewest observations.
//!   This is a point mass, linearly independent from both the regret sensitivity (a
//!   ramp near decision boundaries) and the estimation sensitivity (D-optimal / extremal).
//!   Worst-case detection is genuinely independent from regret and estimation.
//!
//! In the non-contextual case (one cell), average and worst-case are identical, so the
//! distinction is moot.  In the contextual case (many cells), they diverge: average
//! detection remains redundant with estimation, but worst-case detection introduces a
//! genuinely new objective axis.
//!
//! Concretely, each objective wants a different sampling distribution:
//!
//! - **Regret-optimal**: concentrate near decision boundaries.
//! - **Estimation-optimal**: spread to extremes (D-optimal experimental design).
//! - **Detection-optimal (worst-case)**: ensure no cell is starved (space-filling).
//!
//! `LinUcb` exists to break the non-contextual collapse: it learns per-request routing
//! without maintaining separate per-facet histories, at the cost of requiring explicit
//! monitoring budget beyond what regret-optimal sampling provides.
//!
//! ## Saturation principle
//!
//! The number of genuinely independent objectives is bounded by the effective dimension
//! of the design space (Ehrgott & Nickel 2002, via Helly's Theorem):
//!
//! ```text
//!   dim(Pareto front) <= min(m - 1, D_eff)
//! ```
//!
//! where `m` is the number of named objectives and `D_eff` is the design dimension
//! (K-1 for non-contextual, ~K*M for M covariate cells, infinite for continuous
//! covariates).  Adding objectives beyond `D_eff + 1` cannot create new tradeoffs
//! in the Pareto sense (though their weights still affect scalarization tiebreaking).
//!
//! The formal algebraic rank can overstate the practical number of tradeoffs.  The
//! **effective Pareto dimension** is better measured by the singular value spectrum of
//! the Jacobian of objectives with respect to design variables: a K=3, M=9 setup with
//! 8 named objectives achieves formal rank 8 but effective dimension ~3-4 (the top 2
//! singular values carry >99% of the Frobenius norm).  See `pare::sensitivity` for
//! computational tools.
//!
//! ## Design measure perspective
//!
//! The fundamental object is not "which objectives matter" but the **design measure**:
//! the joint distribution over (arm, covariate, time) that the policy induces.  Given
//! the design measure, every objective is computable.
//!
//! Note: the design measure in an adaptive setting is a random object (it depends on
//! the realized trajectory), not a fixed distribution.  Reasoning about it requires
//! either working with the expected design measure (which loses adaptivity) or a
//! conditional analysis that respects the filtration (which is harder and may break
//! clean proportionality results).  See Hadad et al. (arXiv:1911.02768) for the
//! observed vs. expected Fisher information distinction in adaptive experiments.
//!
//! ## Related work
//!
//! **Non-stationary bandits and sliding windows.**
//! Garivier & Moulines (2008, arXiv:0805.3415) established the Sliding-Window UCB
//! (SW-UCB) algorithm, which achieves `O(sqrt(Υ_T * K * T * log T))` regret for
//! piecewise-stationary environments with `Υ_T` changepoints.  This is the theoretical
//! foundation for `muxer`'s sliding-window approach.  Optimal window size is
//! `O(sqrt(T / Υ_T))`; in practice `muxer` uses a fixed caller-chosen cap.
//!
//! **Non-stationary bandits with explicit detection.**
//! Besson, Kaufmann, Maillard & Seznec (2019, arXiv:1902.01575, JMLR 2023) introduced
//! GLR-klUCB: a parameter-free algorithm combining kl-UCB with a Bernoulli Generalized
//! Likelihood Ratio Test.  It is the closest published analog to `muxer`'s monitored
//! selection path.  Key difference: GLR-klUCB **restarts** all arm statistics on
//! detection; `muxer` instead switches to worst-first routing to investigate the flagged
//! arm, preserving history.  See `TriageSession` and `WorstFirstConfig`.
//!
//! **Bandit Quickest Changepoint Detection (BQCD).**
//! Gopalan, Saligrama & Lakshminarayanan (2021, arXiv:2107.10492) established the BQCD
//! lower bound: any algorithm with mean-time-to-false-alarm `m` must suffer expected
//! detection delay `Ω(log(m) / D*)`, where `D*` is the maximum KL divergence between
//! post-change and pre-change distributions across arms.  Their ε-GCD algorithm achieves
//! this with exploration rate `ε = Θ(1/√T)`.  `CoverageConfig`'s minimum sampling-rate
//! floor is the practical lever for approaching this bound.
//!
//! **Sampling-constrained detection.**
//! Zhang & Mei (2020, arXiv:2009.11891) directly analyze changepoint detection under
//! sampling-rate constraints, confirming that detection delay in wall time scales as
//! `h / (KL(p1 || p0) * rate_k)` — the formal basis for the two-clocks approximation.
//!
//! **Observation-cost vs detection delay.**
//! Banerjee & Veeravalli (2012, arXiv:1211.3729) formalize the minimax tradeoff between
//! observation cost and detection delay.  The product identity `R_T * D_avg = const`
//! in the non-contextual collapse is a special case of their framework, where
//! "observation cost" manifests as regret from pulling suboptimal arms.
//!
//! **Regret–BAI Pareto frontier.**
//! Zhong, Cheung & Tan (2021, arXiv:2110.08627) formally prove the Pareto tradeoff
//! between regret minimization (RM) and best-arm identification (BAI): achieving
//! `O(log T)` regret and `O(log T)` BAI error simultaneously is impossible.  The
//! product-identity formulation in `muxer`'s docs is the static-schedule special case.
//!
//! **Piecewise-stationary multi-objective bandits.**
//! Rezaei Balef & Maghsudi (2023, arXiv:2302.05257) study the combination of
//! non-stationarity and multi-objective rewards directly.  `muxer` operates in this
//! space — without claiming worst-case-optimal bounds — by combining sliding-window
//! summaries with Pareto scalarization and explicit monitoring.
//!
//! **Window-limited CUSUM.**
//! Xie, Moustakides & Xie (2022, arXiv:2206.06777) connect windowed observation to
//! CUSUM optimality, providing theoretical grounding for `MonitoredWindow`'s split
//! between baseline and recent windows.
//!
//! **Multi-objective bandit frameworks.**
//! - **Constrained / safe bandits** (BwK): `muxer`'s `max_junk_rate`, `max_drift`, etc.
//!   are BwK-style anytime constraints.  (Badanidiyuru, Kleinberg & Slivkins 2013,
//!   FOCS; arXiv:1305.2545)
//! - **Pareto bandits** (Drugan & Nowé, IJCNN 2013): `muxer`'s `pare`-based frontier
//!   is the selection-time analogue.
//! - **Information-Directed Sampling** (Russo & Van Roy 2014, arXiv:1403.5556):
//!   scalarizes regret/information via the information ratio.  The three-objective
//!   extension is non-trivial only in the contextual worst-case-detection regime.
//! - **Adaptive experiment design** (Hadad et al. 2021, arXiv:1911.02768): the
//!   observed vs. expected Fisher information distinction applies to `muxer`'s
//!   adaptive design-measure analysis.
//!
//! **Pareto dimension and objective redundancy.**
//! - **Ehrgott & Nickel (2002)**, "On the number of criteria needed to decide Pareto
//!   optimality" (Math. Meth. Oper. Res., 55:329–345): proves via Helly's Theorem that
//!   Pareto optimality in D design variables can be decided by at most D+1 objectives.
//!   The saturation principle above is a direct application.
//! - **Objective reduction** (Deb & Saxena 2005): identifies redundant objectives in
//!   many-objective optimization.  Relevant when callers configure more objectives than
//!   the design space supports.
//! - **Pareto front topology** (Kobayashi et al. 2018, arXiv:1812.05222): confirms
//!   empirically that M-objective problems have (M-1)-dimensional fronts, and provides
//!   Bézier simplex fitting for their structure.

#![forbid(unsafe_code)]
#![warn(missing_docs)]

use pare::{Direction, ParetoFrontier};
use std::collections::{BTreeMap, VecDeque};

/// Epsilon used for floating-point tie-breaking in selection scoring.
///
/// This avoids exact equality comparisons on f64 scores and provides a stable
/// threshold across all selection paths (Pareto scalarization, UCB, etc.).
const TIEBREAK_EPS: f64 = 1e-12;

mod decision;
pub use decision::{Decision, DecisionNote, DecisionPolicy};

mod policy;
#[cfg(feature = "stochastic")]
pub use policy::BanditPolicy;

mod alloc;
pub use alloc::softmax_map;

mod utils;
pub use utils::suggested_window_cap;

mod control;
pub use control::{pick_control_arms, split_control_budget, ControlConfig};

mod router;
pub use router::{Router, RouterConfig, RouterDecision, RouterMode, RouterSnapshot};

mod guardrail;
pub use guardrail::LatencyGuardrailConfig;

pub mod monitor;

mod coverage;
pub use coverage::{coverage_pick_under_sampled, coverage_pick_under_sampled_idx, CoverageConfig};

#[cfg(feature = "stochastic")]
mod exp3ix;
#[cfg(feature = "stochastic")]
pub use exp3ix::{Exp3Ix, Exp3IxConfig, Exp3IxState};

#[cfg(feature = "stochastic")]
mod thompson;
#[cfg(feature = "stochastic")]
pub use thompson::{BetaStats, ThompsonConfig, ThompsonSampling, ThompsonState};

#[cfg(feature = "boltzmann")]
mod boltzmann;
#[cfg(feature = "boltzmann")]
pub use boltzmann::{BoltzmannConfig, BoltzmannPolicy};

#[cfg(feature = "contextual")]
mod contextual;
#[cfg(feature = "contextual")]
pub use contextual::{LinUcb, LinUcbArmState, LinUcbConfig, LinUcbScore, LinUcbState};

mod sticky;
pub use sticky::{StickyConfig, StickyMab};

mod stable_hash;
pub use stable_hash::stable_hash64;
pub(crate) use stable_hash::stable_hash64_u64;

mod novelty;
pub use novelty::novelty_pick_unseen;
pub(crate) use novelty::pick_random_subset;

mod prior;
pub use prior::apply_prior_counts_to_summary;

mod worst_first;
pub use worst_first::{
    context_bin, contextual_worst_first_pick_k, contextual_worst_first_pick_one,
    worst_first_pick_k, worst_first_pick_one, ContextBinConfig, ContextualCell,
    ContextualCoverageTracker, WorstFirstConfig,
};

mod harness;
pub use harness::{
    guardrail_filter_observed, guardrail_filter_observed_elapsed, policy_fill_generic,
    policy_fill_k_observed_guardrail_first_with_coverage, policy_fill_k_observed_with_coverage,
    policy_plan_generic, select_k_without_replacement_by, PipelineOrder, PolicyFill, PolicyPlan,
};
#[cfg(feature = "contextual")]
pub use harness::{policy_fill_k_contextual, ContextualPolicyFill};

mod triage;
pub use triage::{OutcomeIdx, TriageSession, TriageSessionConfig};

#[cfg(feature = "stochastic")]
pub use monitor::{calibrate_cusum_threshold, simulate_cusum_null_max_scores};
pub use monitor::{
    calibrate_threshold_from_max_scores, DriftConfig, DriftMetric, MonitoredWindow, RateBoundMode,
    ThresholdCalibration, UncertaintyConfig,
};

/// A single observed outcome for an arm.
#[derive(Debug, Clone, Copy, Default)]
#[cfg_attr(feature = "serde", derive(serde::Serialize))]
#[non_exhaustive]
pub struct Outcome {
    /// Whether the request succeeded for this arm.
    pub ok: bool,
    /// Whether the downstream result was judged “low value” (e.g. blocked, empty extraction).
    pub junk: bool,
    /// Whether the junk was “hard” (e.g. JS/auth wall) vs “soft” (low-signal extraction).
    pub hard_junk: bool,
    /// Provider-specific cost units for this call (caller-defined).
    pub cost_units: u64,
    /// Total elapsed time for this call, in milliseconds.
    pub elapsed_ms: u64,
    /// Optional continuous quality signal in `[0.0, 1.0]` (higher = better).
    ///
    /// Supplements the binary `junk`/`hard_junk` flags with a gradient signal.
    /// A response scoring `0.58` on a `0.60` threshold is functionally different
    /// from a `0.0` response; this field preserves that distinction without
    /// changing the binary routing logic.
    ///
    /// `None` (the default) means "not measured". Set via
    /// [`Window::set_last_quality_score`] when scoring completes after the call.
    #[cfg_attr(feature = "serde", serde(skip_serializing_if = "Option::is_none"))]
    pub quality_score: Option<f64>,
}

impl Outcome {
    /// Create an outcome with the `hard_junk => junk` invariant enforced.
    ///
    /// If `hard_junk` is true, `junk` is forced to true regardless of the
    /// passed value.  This prevents the silent bug where `soft_junk_rate`
    /// (`junk_rate - hard_junk_rate`) saturates to zero.
    pub fn new(ok: bool, junk: bool, hard_junk: bool, cost_units: u64, elapsed_ms: u64) -> Self {
        Self {
            ok,
            junk: junk || hard_junk,
            hard_junk,
            cost_units,
            elapsed_ms,
            quality_score: None,
        }
    }

    /// Create a successful outcome (ok=true, no junk).
    ///
    /// This is the most common case: the call succeeded with no quality issues.
    pub fn success(cost_units: u64, elapsed_ms: u64) -> Self {
        Self {
            ok: true,
            junk: false,
            hard_junk: false,
            cost_units,
            elapsed_ms,
            quality_score: None,
        }
    }

    /// Create a failed outcome (ok=false, hard_junk=true, junk=true).
    ///
    /// Use for complete failures: errors, timeouts, parse failures.
    pub fn failure(cost_units: u64, elapsed_ms: u64) -> Self {
        Self {
            ok: false,
            junk: true,
            hard_junk: true,
            cost_units,
            elapsed_ms,
            quality_score: None,
        }
    }

    /// Create a degraded-but-ok outcome (ok=true, junk=true, hard_junk=false).
    ///
    /// Use for soft quality failures: the call succeeded but the result
    /// was below the caller's quality threshold.
    pub fn degraded(cost_units: u64, elapsed_ms: u64) -> Self {
        Self {
            ok: true,
            junk: true,
            hard_junk: false,
            cost_units,
            elapsed_ms,
            quality_score: None,
        }
    }

    /// Create an outcome with a quality score, enforcing `hard_junk => junk`.
    pub fn with_quality(
        ok: bool,
        junk: bool,
        hard_junk: bool,
        cost_units: u64,
        elapsed_ms: u64,
        quality_score: f64,
    ) -> Self {
        Self {
            ok,
            junk: junk || hard_junk,
            hard_junk,
            cost_units,
            elapsed_ms,
            quality_score: Some(quality_score.clamp(0.0, 1.0)),
        }
    }
}

#[cfg(feature = "serde")]
impl<'de> serde::Deserialize<'de> for Outcome {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        #[derive(serde::Deserialize)]
        struct Raw {
            ok: bool,
            junk: bool,
            hard_junk: bool,
            cost_units: u64,
            elapsed_ms: u64,
            #[serde(default)]
            quality_score: Option<f64>,
        }
        let raw = Raw::deserialize(deserializer)?;
        Ok(Outcome {
            ok: raw.ok,
            junk: raw.junk || raw.hard_junk,
            hard_junk: raw.hard_junk,
            cost_units: raw.cost_units,
            elapsed_ms: raw.elapsed_ms,
            quality_score: raw.quality_score.map(|s| s.clamp(0.0, 1.0)),
        })
    }
}

/// Sliding-window statistics for an arm.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct Window {
    cap: usize,
    buf: VecDeque<Outcome>,
}

impl Window {
    /// Create an empty window with capacity `cap` (minimum 1).
    pub fn new(cap: usize) -> Self {
        Self {
            cap: cap.max(1),
            buf: VecDeque::new(),
        }
    }

    /// Maximum number of outcomes retained.
    pub fn cap(&self) -> usize {
        self.cap
    }

    /// Number of outcomes currently retained.
    pub fn len(&self) -> usize {
        self.buf.len()
    }

    /// Whether the window has no outcomes.
    pub fn is_empty(&self) -> bool {
        self.buf.is_empty()
    }

    /// Iterate over outcomes in the window (oldest to newest).
    pub fn iter(&self) -> impl Iterator<Item = &Outcome> + '_ {
        self.buf.iter()
    }

    /// Push a new outcome, evicting the oldest if at capacity.
    pub fn push(&mut self, o: Outcome) {
        if self.buf.len() == self.cap {
            self.buf.pop_front();
        }
        self.buf.push_back(o);
    }

    /// Best-effort: set “junk” and whether it is “hard junk” for the most recent outcome.
    pub fn set_last_junk_level(&mut self, junk: bool, hard_junk: bool) {
        if let Some(last) = self.buf.back_mut() {
            last.junk = junk;
            last.hard_junk = hard_junk && junk;
        }
    }

    /// Best-effort: set the continuous quality score for the most recent outcome.
    ///
    /// Call this after downstream scoring completes (same pattern as
    /// [`Window::set_last_junk_level`]).  The value is clamped to `[0.0, 1.0]`.
    pub fn set_last_quality_score(&mut self, score: f64) {
        if let Some(last) = self.buf.back_mut() {
            last.quality_score = Some(score.clamp(0.0, 1.0));
        }
    }

    /// Mean quality score over outcomes that have one, or `None` if none have been set.
    ///
    /// Provides a gradient signal complementary to the binary ok/junk rates.
    pub fn mean_quality_score(&self) -> Option<f64> {
        let mut sum = 0.0_f64;
        let mut count = 0u64;
        for o in &self.buf {
            if let Some(q) = o.quality_score {
                sum += q;
                count += 1;
            }
        }
        if count == 0 {
            None
        } else {
            Some(sum / count as f64)
        }
    }

    /// Summarize the current window as counts and sums.
    pub fn summary(&self) -> Summary {
        let n = self.buf.len() as u64;
        if n == 0 {
            return Summary::default();
        }
        let mut ok = 0u64;
        let mut junk = 0u64;
        let mut hard_junk = 0u64;
        let mut cost_units = 0u64;
        let mut elapsed_ms_sum = 0u64;
        for o in &self.buf {
            ok += o.ok as u64;
            junk += o.junk as u64;
            hard_junk += o.hard_junk as u64;
            cost_units = cost_units.saturating_add(o.cost_units);
            elapsed_ms_sum = elapsed_ms_sum.saturating_add(o.elapsed_ms);
        }
        // Compute mean quality score from outcomes that have it set.
        let mut quality_sum = 0.0_f64;
        let mut quality_count = 0u64;
        for o in &self.buf {
            if let Some(q) = o.quality_score {
                quality_sum += q;
                quality_count += 1;
            }
        }
        let mean_quality_score = if quality_count > 0 {
            Some(quality_sum / quality_count as f64)
        } else {
            None
        };

        Summary {
            calls: n,
            ok,
            junk,
            hard_junk,
            cost_units,
            elapsed_ms_sum,
            mean_quality_score,
        }
    }
}

#[derive(Debug, Clone, Copy, Default)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
/// Aggregate counts/sums over a window of outcomes.
pub struct Summary {
    /// Number of calls observed.
    pub calls: u64,
    /// Number of successful calls.
    pub ok: u64,
    /// Number of calls judged “junk”.
    pub junk: u64,
    /// Number of calls judged “hard junk”.
    pub hard_junk: u64,
    /// Sum of `cost_units` over calls.
    pub cost_units: u64,
    /// Sum of `elapsed_ms` over calls.
    pub elapsed_ms_sum: u64,
    /// Mean quality score over outcomes that had `quality_score` set, or `None`.
    ///
    /// Populated by [`Window::summary`].  When constructing `Summary` directly
    /// (not via `Window`), set this to reflect a pre-computed quality estimate.
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub mean_quality_score: Option<f64>,
}

impl Summary {
    /// Fraction of calls that succeeded.
    pub fn ok_rate(&self) -> f64 {
        if self.calls == 0 {
            0.0
        } else {
            (self.ok as f64) / (self.calls as f64)
        }
    }

    /// Fraction of calls that were judged “junk”.
    pub fn junk_rate(&self) -> f64 {
        if self.calls == 0 {
            0.0
        } else {
            (self.junk as f64) / (self.calls as f64)
        }
    }

    /// Mean `cost_units` per call.
    pub fn mean_cost_units(&self) -> f64 {
        if self.calls == 0 {
            0.0
        } else {
            (self.cost_units as f64) / (self.calls as f64)
        }
    }

    /// Mean `elapsed_ms` per call.
    pub fn mean_elapsed_ms(&self) -> f64 {
        if self.calls == 0 {
            0.0
        } else {
            (self.elapsed_ms_sum as f64) / (self.calls as f64)
        }
    }

    /// Fraction of calls that were judged “hard junk”.
    pub fn hard_junk_rate(&self) -> f64 {
        if self.calls == 0 {
            0.0
        } else {
            (self.hard_junk as f64) / (self.calls as f64)
        }
    }

    /// Fraction of calls that were judged “soft junk” (junk but not hard junk).
    pub fn soft_junk_rate(&self) -> f64 {
        if self.calls == 0 {
            0.0
        } else {
            let soft = self.junk.saturating_sub(self.hard_junk);
            (soft as f64) / (self.calls as f64)
        }
    }
}

/// How to extract an objective value from a [`Summary`].
///
/// Built-in extractors cover the standard `Outcome` fields.  For caller-defined
/// objectives, use [`Extract::Custom`] and set [`Objective::value`] per arm
/// before each selection call.
#[derive(Debug, Clone, Copy, PartialEq)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
#[non_exhaustive]
pub enum Extract {
    /// `ok_rate + UCB exploration bonus` (requires `exploration_c` from config).
    OkRateUcb,
    /// `Summary::mean_cost_units()`.
    MeanCost,
    /// `Summary::mean_elapsed_ms()`.
    MeanLatency,
    /// `Summary::hard_junk_rate()`.
    HardJunkRate,
    /// `Summary::soft_junk_rate()` (`junk_rate - hard_junk_rate`).
    SoftJunkRate,
    /// `Summary::mean_quality_score` (0.0 when absent).
    MeanQuality,
    /// Caller-defined objective.  The value must be set via [`Objective::value`]
    /// before each selection call; extraction returns 0.0 as fallback.
    Custom,
}

impl Extract {
    /// Extract the raw value from a summary.
    ///
    /// `ucb` is the pre-computed UCB term; only used by `OkRateUcb`.
    /// For `Custom`, returns 0.0 (the caller should set `Objective::value` instead).
    pub fn apply(self, s: &Summary, ucb: f64) -> f64 {
        match self {
            Self::OkRateUcb => s.ok_rate() + ucb,
            Self::MeanCost => s.mean_cost_units(),
            Self::MeanLatency => s.mean_elapsed_ms(),
            Self::HardJunkRate => s.hard_junk_rate(),
            Self::SoftJunkRate => s.soft_junk_rate(),
            Self::MeanQuality => s.mean_quality_score.unwrap_or(0.0),
            Self::Custom => 0.0,
        }
    }
}

/// A single objective dimension for Pareto selection.
///
/// Each objective contributes one axis to the Pareto frontier and one
/// term to the scalarized tiebreaker.  `direction` controls the frontier;
/// `weight` controls scalarization (higher weight = more influence).
///
/// For custom objectives not derivable from [`Summary`], set `extract`
/// to any variant and override the value via
/// [`Objective::value`] before passing to selection.
#[derive(Debug, Clone, Copy, PartialEq)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct Objective {
    /// How to extract this objective from a `Summary`.
    pub extract: Extract,
    /// Whether higher values are better (`Maximize`) or worse (`Minimize`).
    pub direction: Direction,
    /// Scalarization weight (0 disables this objective in the tiebreaker
    /// but it still participates in the Pareto frontier).
    pub weight: f64,
    /// Pre-computed value override.  When `Some`, this value is used
    /// instead of extracting from `Summary`.  Useful for custom
    /// objectives or monitored scores (drift, catKL, CUSUM).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub value: Option<f64>,
}

impl Objective {
    /// Create an objective that maximizes the extracted value.
    pub fn maximize(extract: Extract, weight: f64) -> Self {
        Self {
            extract,
            direction: Direction::Maximize,
            weight,
            value: None,
        }
    }

    /// Create an objective that minimizes the extracted value.
    pub fn minimize(extract: Extract, weight: f64) -> Self {
        Self {
            extract,
            direction: Direction::Minimize,
            weight,
            value: None,
        }
    }

    /// Create a caller-defined objective with a pre-set value.
    ///
    /// Use this for objectives not derivable from `Summary` (e.g., revenue,
    /// domain-specific scores).  Update the value per arm before each
    /// selection call via [`Objective::with_value`].
    pub fn custom(direction: Direction, weight: f64, value: f64) -> Self {
        Self {
            extract: Extract::Custom,
            direction,
            weight,
            value: Some(value),
        }
    }

    /// Override the extracted value with a pre-computed one.
    pub fn with_value(mut self, v: f64) -> Self {
        self.value = Some(v);
        self
    }

    /// Resolve the value: use the override if present, otherwise extract from summary.
    pub fn resolve(&self, s: &Summary, ucb: f64) -> f64 {
        self.value.unwrap_or_else(|| self.extract.apply(s, ucb))
    }

    /// Value oriented for Pareto (always maximize): negates for `Minimize` objectives.
    pub fn pareto_value(&self, s: &Summary, ucb: f64) -> f64 {
        let v = self.resolve(s, ucb);
        match self.direction {
            Direction::Maximize => v,
            Direction::Minimize => -v,
        }
    }

    /// Signed scalarization contribution (higher is always better).
    pub fn scalar_contribution(&self, s: &Summary, ucb: f64) -> f64 {
        let v = self.resolve(s, ucb);
        match self.direction {
            Direction::Maximize => self.weight * v,
            Direction::Minimize => -(self.weight * v),
        }
    }
}

/// The default objective set for deterministic MAB selection.
///
/// Reproduces the pre-0.5 hardcoded behavior:
/// - Maximize `ok_rate + UCB` (weight 1.0)
/// - Minimize mean cost (weight 0.0 -- disabled by default)
/// - Minimize mean latency (weight 0.0)
/// - Minimize hard junk rate (weight 0.0)
/// - Minimize soft junk rate (weight 0.0)
/// - Maximize mean quality (weight 0.0)
pub fn default_objectives() -> Vec<Objective> {
    vec![
        Objective::maximize(Extract::OkRateUcb, 1.0),
        Objective::minimize(Extract::MeanCost, 0.0),
        Objective::minimize(Extract::MeanLatency, 0.0),
        Objective::minimize(Extract::HardJunkRate, 0.0),
        Objective::minimize(Extract::SoftJunkRate, 0.0),
        Objective::maximize(Extract::MeanQuality, 0.0),
    ]
}

/// Configuration knobs for deterministic MAB-style selection.
///
/// Contains the objective list, exploration coefficient, and hard constraints
/// used by all selection paths ([`select_mab`], [`select_mab_explain`],
/// [`select_mab_decide`]).
///
/// For monitored selection APIs (`select_mab_monitored_*`), use [`MonitoredMabConfig`]
/// which wraps this with additional monitoring-specific guards.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct MabConfig {
    /// UCB exploration coefficient.
    pub exploration_c: f64,
    /// Objective dimensions for the Pareto frontier and scalarization.
    ///
    /// Each objective defines an axis (extract + direction) and a
    /// scalarization weight.  The default set ([`default_objectives`])
    /// reproduces pre-0.5 behavior.
    ///
    /// The saturation principle (Ehrgott & Nickel 2002) bounds the effective
    /// Pareto dimension at `min(objectives.len() - 1, K - 1)` for K arms.
    /// Objectives beyond K cannot create new Pareto tradeoffs, though their
    /// weights still affect scalarization tiebreaking.
    pub objectives: Vec<Objective>,
    /// Optional constraint: discard arms whose windowed junk_rate exceeds this.
    pub max_junk_rate: Option<f64>,
    /// Optional constraint: discard arms whose windowed hard_junk_rate exceeds this.
    pub max_hard_junk_rate: Option<f64>,
    /// Optional constraint: discard arms whose windowed mean_cost_units exceeds this.
    pub max_mean_cost_units: Option<f64>,
}

impl Default for MabConfig {
    fn default() -> Self {
        Self {
            exploration_c: 0.7,
            objectives: default_objectives(),
            max_junk_rate: None,
            max_hard_junk_rate: None,
            max_mean_cost_units: None,
        }
    }
}

impl MabConfig {
    /// Set the weight for an objective matching the given extractor.
    /// If not found, the objective is not added (no-op).
    pub fn set_weight(&mut self, extract: Extract, weight: f64) {
        if let Some(obj) = self.objectives.iter_mut().find(|o| o.extract == extract) {
            obj.weight = weight;
        }
    }

    /// Builder: set cost weight.
    pub fn with_cost_weight(mut self, w: f64) -> Self {
        self.set_weight(Extract::MeanCost, w);
        self
    }

    /// Builder: set latency weight.
    pub fn with_latency_weight(mut self, w: f64) -> Self {
        self.set_weight(Extract::MeanLatency, w);
        self
    }

    /// Builder: set soft junk weight.
    pub fn with_junk_weight(mut self, w: f64) -> Self {
        self.set_weight(Extract::SoftJunkRate, w);
        self
    }

    /// Builder: set hard junk weight.
    pub fn with_hard_junk_weight(mut self, w: f64) -> Self {
        self.set_weight(Extract::HardJunkRate, w);
        self
    }

    /// Builder: set quality weight.
    pub fn with_quality_weight(mut self, w: f64) -> Self {
        self.set_weight(Extract::MeanQuality, w);
        self
    }

    /// Builder: set objectives directly (replaces default set).
    pub fn with_objectives(mut self, objectives: Vec<Objective>) -> Self {
        self.objectives = objectives;
        self
    }
}

/// Extended configuration for monitored MAB selection.
///
/// Includes all base [`MabConfig`] fields plus monitoring-specific guards
/// (drift, categorical KL, CUSUM) that only apply in `select_mab_monitored_*` APIs.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct MonitoredMabConfig {
    /// Base selection configuration.
    pub base: MabConfig,
    /// Optional drift guard: discard arms whose drift score exceeds this.
    pub max_drift: Option<f64>,
    /// Drift metric used when applying `max_drift`.
    pub drift_metric: DriftMetric,
    /// Penalty weight for drift (0 disables).
    pub drift_weight: f64,
    /// Rate uncertainty configuration for Wilson bounds.
    pub uncertainty: UncertaintyConfig,
    /// Optional categorical KL guard threshold.
    pub max_catkl: Option<f64>,
    /// Dirichlet smoothing pseudo-count for categorical KL.
    pub catkl_alpha: f64,
    /// Minimum baseline samples for categorical KL.
    pub catkl_min_baseline: u64,
    /// Minimum recent samples for categorical KL.
    pub catkl_min_recent: u64,
    /// Penalty weight for categorical KL score (0 disables).
    pub catkl_weight: f64,
    /// Optional categorical CUSUM guard threshold.
    pub max_cusum: Option<f64>,
    /// Dirichlet smoothing pseudo-count for CUSUM.
    pub cusum_alpha: f64,
    /// Minimum baseline samples for CUSUM.
    pub cusum_min_baseline: u64,
    /// Minimum recent samples for CUSUM.
    pub cusum_min_recent: u64,
    /// Alternative distribution for CUSUM.
    pub cusum_alt_p: Option<[f64; 4]>,
    /// Penalty weight for CUSUM score (0 disables).
    pub cusum_weight: f64,
}

impl Default for MonitoredMabConfig {
    fn default() -> Self {
        Self {
            base: MabConfig::default(),
            max_drift: None,
            drift_metric: DriftMetric::default(),
            drift_weight: 0.0,
            uncertainty: UncertaintyConfig::default(),
            max_catkl: None,
            catkl_alpha: 1e-3,
            catkl_min_baseline: 40,
            catkl_min_recent: 20,
            catkl_weight: 0.0,
            max_cusum: None,
            cusum_alpha: 1e-3,
            cusum_min_baseline: 40,
            cusum_min_recent: 20,
            cusum_alt_p: None,
            cusum_weight: 0.0,
        }
    }
}

impl From<MabConfig> for MonitoredMabConfig {
    fn from(base: MabConfig) -> Self {
        Self {
            base,
            ..Self::default()
        }
    }
}

/// A resolved objective value for one candidate arm.
#[derive(Debug, Clone, Copy)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct ObjectiveValue {
    /// The extractor that produced this value.
    pub extract: Extract,
    /// The resolved value (after applying overrides / uncertainty bounds).
    pub value: f64,
    /// The Pareto-oriented value (negated for `Minimize` objectives).
    pub pareto_value: f64,
    /// Scalarization contribution (signed, higher is better).
    pub scalar_contribution: f64,
}

/// Debug row for one candidate arm.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct CandidateDebug {
    /// Arm name.
    pub name: String,
    /// Summary snapshot for this arm.
    pub summary: Summary,
    /// UCB exploration term.
    pub ucb: f64,
    /// Per-objective resolved values (in the same order as `MabConfig::objectives`).
    pub objective_values: Vec<ObjectiveValue>,
    /// Total scalarized score (sum of all `scalar_contribution` values).
    pub score: f64,

    /// Optional drift score for this arm (present for monitored selection).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub drift_score: Option<f64>,

    /// Optional categorical KL score for this arm (present for monitored selection).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub catkl_score: Option<f64>,

    /// Optional categorical CUSUM score for this arm (present for monitored selection).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub cusum_score: Option<f64>,

    /// Wilson half-width for ok_rate (present when uncertainty bounds are enabled).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub ok_half_width: Option<f64>,

    /// Wilson half-width for junk_rate (present when uncertainty bounds are enabled).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub junk_half_width: Option<f64>,

    /// Wilson half-width for hard_junk_rate (present when uncertainty bounds are enabled).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub hard_junk_half_width: Option<f64>,
}

/// Output of `select_mab` (chosen arm + debugging context).
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct Selection {
    /// The selected arm.
    pub chosen: String,
    /// Arm names on the Pareto frontier (in the frontier’s iteration order).
    pub frontier: Vec<String>,
    /// Candidate debug rows (in the input arm order).
    pub candidates: Vec<CandidateDebug>,
    /// The config used to compute this selection.
    pub config: MabConfig,
}

/// Additional metadata for a deterministic `select_mab` decision.
///
/// This exists because production routers typically need more than "which arm":
/// they also need to know whether constraints eliminated all arms (fallback) and
/// whether the decision was explore-first.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct MabSelectionDecision {
    /// The base deterministic selection output (chosen arm + debug rows).
    pub selection: Selection,
    /// Arms that were eligible after applying constraints.
    ///
    /// If constraints eliminated all arms, this is equal to the original `arms_in_order`.
    pub eligible_arms: Vec<String>,
    /// True if constraints eliminated all arms and the selector fell back to the original set.
    pub constraints_fallback_used: bool,
    /// True if the selector chose an arm due to explore-first (some arm had `calls == 0`).
    pub explore_first: bool,

    /// Drift guard outcome (only present for monitored selection).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub drift_guard: Option<DriftGuardDecision>,

    /// Categorical KL guard outcome (only present for monitored selection).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub catkl_guard: Option<CatKlGuardDecision>,

    /// Categorical CUSUM guard outcome (only present for monitored selection).
    #[cfg_attr(
        feature = "serde",
        serde(default, skip_serializing_if = "Option::is_none")
    )]
    pub cusum_guard: Option<CusumGuardDecision>,
}

/// Output of applying a drift guard to a candidate set.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct DriftGuardDecision {
    /// Arms eligible after applying drift guard.
    pub eligible_arms: Vec<String>,
    /// Whether we fell back to the full input set because drift guard would have eliminated all arms.
    pub fallback_used: bool,
    /// Drift metric used.
    pub metric: DriftMetric,
    /// The max drift threshold used.
    pub max_drift: f64,
}

/// Output of applying a categorical KL guard to a candidate set.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct CatKlGuardDecision {
    /// Arms eligible after applying catKL guard.
    pub eligible_arms: Vec<String>,
    /// Whether we fell back to the full input set because the guard would have eliminated all arms.
    pub fallback_used: bool,
    /// The threshold on `n_recent * KL(q_recent || p0_baseline)`.
    pub max_catkl: f64,
    /// Dirichlet smoothing pseudo-count.
    pub alpha: f64,
    /// Minimum baseline samples required.
    pub min_baseline: u64,
    /// Minimum recent samples required.
    pub min_recent: u64,
}

/// Output of applying a categorical CUSUM guard to a candidate set.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct CusumGuardDecision {
    /// Arms eligible after applying CUSUM guard.
    pub eligible_arms: Vec<String>,
    /// Whether we fell back to the full input set because the guard would have eliminated all arms.
    pub fallback_used: bool,
    /// CUSUM threshold used.
    pub max_cusum: f64,
    /// Dirichlet smoothing pseudo-count.
    pub alpha: f64,
    /// Minimum baseline samples required.
    pub min_baseline: u64,
    /// Minimum recent samples required.
    pub min_recent: u64,
    /// Alternative distribution used for CUSUM.
    pub alt_p: [f64; 4],
}

fn apply_base_constraints(
    arms_in_order: &[String],
    summaries: &BTreeMap<String, Summary>,
    cfg: &MabConfig,
) -> (Vec<String>, bool) {
    // Apply hard constraints (BwK-ish “anytime” gating).
    // If constraints filter everything, fall back to the original arm set (never return empty).
    let mut eligible: Vec<String> = Vec::new();
    for a in arms_in_order {
        let s = summaries.get(a).copied().unwrap_or_default();
        let ok = cfg
            .max_junk_rate
            .map(|thr| s.junk_rate() <= thr)
            .unwrap_or(true)
            && cfg
                .max_hard_junk_rate
                .map(|thr| s.hard_junk_rate() <= thr)
                .unwrap_or(true)
            && cfg
                .max_mean_cost_units
                .map(|thr| s.mean_cost_units() <= thr)
                .unwrap_or(true);
        if ok {
            eligible.push(a.clone());
        }
    }
    let constraints_fallback_used = eligible.is_empty();
    let eligible_arms: Vec<String> = if constraints_fallback_used {
        arms_in_order.to_vec()
    } else {
        eligible
    };
    (eligible_arms, constraints_fallback_used)
}

fn explore_first_decision(
    chosen: String,
    eligible_arms: Vec<String>,
    constraints_fallback_used: bool,
    cfg: MabConfig,
) -> MabSelectionDecision {
    let zero_summary = Summary::default();
    let obj_values: Vec<ObjectiveValue> = cfg
        .objectives
        .iter()
        .map(|obj| ObjectiveValue {
            extract: obj.extract,
            value: 0.0,
            pareto_value: 0.0,
            scalar_contribution: 0.0,
        })
        .collect();
    let sel = Selection {
        chosen: chosen.clone(),
        frontier: vec![chosen.clone()],
        candidates: vec![CandidateDebug {
            name: chosen,
            summary: zero_summary,
            ucb: 0.0,
            objective_values: obj_values,
            score: 0.0,
            drift_score: None,
            catkl_score: None,
            cusum_score: None,
            ok_half_width: None,
            junk_half_width: None,
            hard_junk_half_width: None,
        }],
        config: cfg,
    };
    MabSelectionDecision {
        selection: sel,
        eligible_arms,
        constraints_fallback_used,
        explore_first: true,
        drift_guard: None,
        catkl_guard: None,
        cusum_guard: None,
    }
}

fn choose_from_frontier(
    candidates: &[CandidateDebug],
    frontier_names_in_order: &[String],
    fallback_first: Option<&String>,
) -> (String, Vec<String>) {
    // Build Pareto frontier from the candidates' pareto_value vectors.
    // All axes are maximize-oriented (Minimize objectives are pre-negated).
    let dims = candidates
        .first()
        .map(|c| c.objective_values.len())
        .unwrap_or(0);
    let mut frontier = ParetoFrontier::new(vec![Direction::Maximize; dims]);
    for (i, c) in candidates.iter().enumerate() {
        let pt: Vec<f64> = c.objective_values.iter().map(|o| o.pareto_value).collect();
        frontier.push(pt, i);
    }
    let mut frontier_indices: Vec<usize> = if frontier.is_empty() {
        (0..candidates.len()).collect()
    } else {
        frontier.points().iter().map(|p| p.data).collect()
    };
    // Ensure stable ordering regardless of frontier internals.
    frontier_indices.sort_unstable();

    let frontier_names: Vec<String> = frontier_indices
        .iter()
        .filter_map(|&i| frontier_names_in_order.get(i).cloned())
        .collect();

    let mut best_name = frontier_names
        .first()
        .cloned()
        .unwrap_or_else(|| fallback_first.cloned().unwrap_or_default());
    let mut best_score = f64::NEG_INFINITY;
    for &idx in &frontier_indices {
        let Some(c) = candidates.get(idx) else {
            continue;
        };
        if c.score > best_score
            || ((c.score - best_score).abs() <= TIEBREAK_EPS && c.name < best_name)
        {
            best_score = c.score;
            best_name = c.name.clone();
        }
    }

    (best_name, frontier_names)
}

/// Deterministic selection:
/// - Explore each arm at least once (in stable order).
/// - Then:
///   - build a Pareto frontier over:
///     - maximize success (ok_rate, plus UCB)
///     - minimize mean cost_units
///     - minimize mean latency
///     - minimize junk_rate
///   - pick the best scalarized point (with stable tie-break)
///
/// # Example
///
/// ```rust
/// use muxer::{select_mab, MabConfig, Summary};
/// use std::collections::BTreeMap;
///
/// let arms = vec!["a".to_string(), "b".to_string()];
/// let mut summaries = BTreeMap::new();
/// summaries.insert(
///     "a".to_string(),
///     Summary { calls: 10, ok: 9, junk: 0, hard_junk: 0, cost_units: 10, elapsed_ms_sum: 900, mean_quality_score: None }
/// );
/// summaries.insert(
///     "b".to_string(),
///     Summary { calls: 10, ok: 9, junk: 2, hard_junk: 0, cost_units: 10, elapsed_ms_sum: 900, mean_quality_score: None }
/// );
///
/// let sel = select_mab(&arms, &summaries, MabConfig::default());
/// assert_eq!(sel.chosen, "a");
/// ```
pub fn select_mab(
    arms_in_order: &[String],
    summaries: &BTreeMap<String, Summary>,
    cfg: MabConfig,
) -> Selection {
    select_mab_explain(arms_in_order, summaries, cfg).selection
}

/// Like `select_mab`, but also returns metadata about constraints and explore-first behavior.
pub fn select_mab_explain(
    arms_in_order: &[String],
    summaries: &BTreeMap<String, Summary>,
    cfg: MabConfig,
) -> MabSelectionDecision {
    let (eligible_arms, constraints_fallback_used) =
        apply_base_constraints(arms_in_order, summaries, &cfg);
    let arms_in_order: &[String] = &eligible_arms;

    // Explore first.
    let explore_choice: Option<String> = arms_in_order
        .iter()
        .find(|a| summaries.get(*a).copied().unwrap_or_default().calls == 0)
        .cloned();
    if let Some(chosen) = explore_choice {
        return explore_first_decision(chosen, eligible_arms, constraints_fallback_used, cfg);
    }

    let total_calls: f64 = arms_in_order
        .iter()
        .map(|a| summaries.get(a).copied().unwrap_or_default().calls as f64)
        .sum::<f64>()
        .max(1.0);

    let mut frontier_names_in_order: Vec<String> = Vec::new();
    let mut candidates = Vec::new();

    for a in arms_in_order {
        let s = summaries.get(a).copied().unwrap_or_default();
        let n = (s.calls as f64).max(1.0);
        let ucb = cfg.exploration_c * ((total_calls.ln() / n).sqrt());

        let obj_values: Vec<ObjectiveValue> = cfg
            .objectives
            .iter()
            .map(|obj| {
                let value = obj.resolve(&s, ucb);
                ObjectiveValue {
                    extract: obj.extract,
                    value,
                    pareto_value: obj.pareto_value(&s, ucb),
                    scalar_contribution: obj.scalar_contribution(&s, ucb),
                }
            })
            .collect();

        let score: f64 = obj_values.iter().map(|o| o.scalar_contribution).sum();

        candidates.push(CandidateDebug {
            name: a.clone(),
            summary: s,
            ucb,
            objective_values: obj_values,
            score,
            drift_score: None,
            catkl_score: None,
            cusum_score: None,
            ok_half_width: None,
            junk_half_width: None,
            hard_junk_half_width: None,
        });
        frontier_names_in_order.push(a.clone());
    }

    let (best_name, frontier_names) =
        choose_from_frontier(&candidates, &frontier_names_in_order, arms_in_order.first());

    let sel = Selection {
        chosen: best_name,
        frontier: frontier_names,
        candidates,
        config: cfg,
    };
    MabSelectionDecision {
        selection: sel,
        eligible_arms,
        constraints_fallback_used,
        explore_first: false,
        drift_guard: None,
        catkl_guard: None,
        cusum_guard: None,
    }
}

/// Monitored deterministic selection (baseline vs recent drift + uncertainty-aware rates).
///
/// This is intended for production routers that already maintain `MonitoredWindow`s per arm.
///
/// Semantics:
/// - Uses `monitored[*].recent_summary()` for the base stats (ok/junk/cost/latency).
/// - Optionally applies a drift guardrail (`cfg.max_drift`) against baseline-vs-recent drift.
/// - Optionally penalizes drift (`cfg.drift_weight`) as an additional objective.
/// - Optionally applies categorical KL monitoring as an objective/guard (`cfg.max_catkl`, `cfg.catkl_weight`).
/// - Optionally adjusts rates using Wilson bounds (`cfg.uncertainty`).
///
/// Like `select_mab_explain`, this never returns an empty choice set: if drift filtering would
/// eliminate all arms, it falls back to the unfiltered eligible set.
///
/// This is a convenience wrapper: it builds summaries from `monitored[*].recent_summary()` and
/// delegates to [`select_mab_monitored_explain_with_summaries`].
pub fn select_mab_monitored_explain(
    arms_in_order: &[String],
    monitored: &BTreeMap<String, MonitoredWindow>,
    drift_cfg: DriftConfig,
    cfg: MonitoredMabConfig,
) -> MabSelectionDecision {
    let summaries: BTreeMap<String, Summary> = monitored
        .iter()
        .map(|(k, w)| (k.clone(), w.recent_summary()))
        .collect();
    select_mab_monitored_explain_with_summaries(
        arms_in_order,
        &summaries,
        monitored,
        drift_cfg,
        cfg,
    )
}

/// Like `select_mab_monitored_explain`, but uses caller-provided summaries for the base objectives
/// (e.g. prior-smoothed rates), while still computing monitoring scores from `monitored`.
///
/// This is useful for harnesses that want:
/// - selection on a smoothed/aggregated summary, but
/// - drift/catKL/CUSUM computed from raw baseline/recent windows.
pub fn select_mab_monitored_explain_with_summaries(
    arms_in_order: &[String],
    summaries: &BTreeMap<String, Summary>,
    monitored: &BTreeMap<String, MonitoredWindow>,
    drift_cfg: DriftConfig,
    cfg: MonitoredMabConfig,
) -> MabSelectionDecision {
    let base = &cfg.base;

    // Apply base hard constraints first (same semantics as `select_mab_explain`).
    let (eligible_arms, constraints_fallback_used) =
        apply_base_constraints(arms_in_order, summaries, base);
    let arms_in_order: &[String] = &eligible_arms;

    // Explore first (stable order).
    let explore_choice: Option<String> = arms_in_order
        .iter()
        .find(|a| summaries.get(*a).copied().unwrap_or_default().calls == 0)
        .cloned();
    if let Some(chosen) = explore_choice {
        return explore_first_decision(
            chosen,
            eligible_arms,
            constraints_fallback_used,
            base.clone(),
        );
    }

    // Apply drift guard (optional) over the constraint-eligible set.
    let max_drift = cfg
        .max_drift
        .and_then(|x| (x.is_finite() && x >= 0.0).then_some(x));
    let mut eligible_after_drift = arms_in_order.to_vec();
    let mut drift_guard: Option<DriftGuardDecision> = None;
    if let Some(thr) = max_drift {
        let mut kept: Vec<String> = Vec::new();
        for a in arms_in_order {
            let Some(w) = monitored.get(a) else {
                kept.push(a.clone());
                continue;
            };
            let d = monitor::drift_between_windows(
                w.baseline(),
                w.recent(),
                DriftConfig {
                    metric: cfg.drift_metric,
                    ..drift_cfg
                },
            );
            let violates = d.as_ref().map(|x| x.score > thr).unwrap_or(false);
            if !violates {
                kept.push(a.clone());
            }
        }
        let fallback_used = kept.is_empty();
        let eligible_arms = if fallback_used {
            arms_in_order.to_vec()
        } else {
            kept
        };
        drift_guard = Some(DriftGuardDecision {
            eligible_arms: eligible_arms.clone(),
            fallback_used,
            metric: cfg.drift_metric,
            max_drift: thr,
        });
        eligible_after_drift = eligible_arms;
    }

    // Apply categorical KL guard (optional) over the drift-eligible set.
    let max_catkl = cfg
        .max_catkl
        .and_then(|x| (x.is_finite() && x >= 0.0).then_some(x));
    let catkl_alpha = if cfg.catkl_alpha.is_finite() && cfg.catkl_alpha > 0.0 {
        cfg.catkl_alpha
    } else {
        1e-3
    };
    let mut eligible_after_catkl = eligible_after_drift.clone();
    let mut catkl_guard: Option<CatKlGuardDecision> = None;
    if let Some(thr) = max_catkl {
        let mut kept: Vec<String> = Vec::new();
        for a in &eligible_after_drift {
            let Some(w) = monitored.get(a) else {
                kept.push(a.clone());
                continue;
            };
            let s = monitor::catkl_score_between_windows(
                w.baseline(),
                w.recent(),
                catkl_alpha,
                drift_cfg.tol,
                cfg.catkl_min_baseline,
                cfg.catkl_min_recent,
            );
            let violates = s.map(|x| x > thr).unwrap_or(false);
            if !violates {
                kept.push(a.clone());
            }
        }
        let fallback_used = kept.is_empty();
        let eligible_arms = if fallback_used {
            eligible_after_drift.clone()
        } else {
            kept
        };
        catkl_guard = Some(CatKlGuardDecision {
            eligible_arms: eligible_arms.clone(),
            fallback_used,
            max_catkl: thr,
            alpha: catkl_alpha,
            min_baseline: cfg.catkl_min_baseline,
            min_recent: cfg.catkl_min_recent,
        });
        eligible_after_catkl = eligible_arms;
    }

    // Apply categorical CUSUM guard (optional) over the catKL-eligible set.
    let max_cusum = cfg
        .max_cusum
        .and_then(|x| (x.is_finite() && x >= 0.0).then_some(x));
    let cusum_alpha = if cfg.cusum_alpha.is_finite() && cfg.cusum_alpha > 0.0 {
        cfg.cusum_alpha
    } else {
        1e-3
    };
    let cusum_alt_p = cfg.cusum_alt_p.unwrap_or([0.05, 0.05, 0.45, 0.45]);
    let mut eligible_after_cusum = eligible_after_catkl.clone();
    let mut cusum_guard: Option<CusumGuardDecision> = None;
    if let Some(thr) = max_cusum {
        let mut kept: Vec<String> = Vec::new();
        for a in &eligible_after_catkl {
            let Some(w) = monitored.get(a) else {
                kept.push(a.clone());
                continue;
            };
            let s = monitor::cusum_score_between_windows(
                w.baseline(),
                w.recent(),
                cusum_alpha,
                drift_cfg.tol,
                cfg.cusum_min_baseline,
                cfg.cusum_min_recent,
                Some(cusum_alt_p),
            );
            let violates = s.map(|x| x > thr).unwrap_or(false);
            if !violates {
                kept.push(a.clone());
            }
        }
        let fallback_used = kept.is_empty();
        let eligible_arms = if fallback_used {
            eligible_after_catkl.clone()
        } else {
            kept
        };
        cusum_guard = Some(CusumGuardDecision {
            eligible_arms: eligible_arms.clone(),
            fallback_used,
            max_cusum: thr,
            alpha: cusum_alpha,
            min_baseline: cfg.cusum_min_baseline,
            min_recent: cfg.cusum_min_recent,
            alt_p: cusum_alt_p,
        });
        eligible_after_cusum = eligible_arms;
    }

    // Only count calls for the arms actually being considered.
    let total_calls: f64 = eligible_after_cusum
        .iter()
        .map(|a| summaries.get(a).copied().unwrap_or_default().calls as f64)
        .sum::<f64>()
        .max(1.0);

    // Monitored Pareto frontier: base objectives (with Wilson-bounded overrides)
    // plus monitoring objectives (drift, catKL, CUSUM).
    let mut frontier_names_in_order: Vec<String> = Vec::new();
    let mut candidates: Vec<CandidateDebug> = Vec::new();

    // Build monitoring objectives to append to the base set.
    let monitoring_objectives = [
        Objective::minimize(Extract::MeanCost, cfg.drift_weight.max(0.0)), // placeholder extract, value overridden
        Objective::minimize(Extract::MeanCost, cfg.catkl_weight.max(0.0)),
        Objective::minimize(Extract::MeanCost, cfg.cusum_weight.max(0.0)),
    ];

    for a in &eligible_after_cusum {
        let s = summaries.get(a).copied().unwrap_or_default();
        let n = (s.calls as f64).max(1.0);

        // Uncertainty-aware rates (Wilson).
        let z = cfg.uncertainty.z;
        let soft = s.junk.saturating_sub(s.hard_junk);
        let (ok_rate_used, ok_half) =
            monitor::apply_rate_bound(s.ok, s.calls, z, cfg.uncertainty.ok_mode);
        let (hard_used, hard_half) =
            monitor::apply_rate_bound(s.hard_junk, s.calls, z, cfg.uncertainty.hard_junk_mode);
        let (soft_used, soft_half) =
            monitor::apply_rate_bound(soft, s.calls, z, cfg.uncertainty.junk_mode);

        // Monitoring scores from raw windows.
        let drift_score = monitored.get(a).and_then(|w| {
            monitor::drift_between_windows(
                w.baseline(),
                w.recent(),
                DriftConfig {
                    metric: cfg.drift_metric,
                    ..drift_cfg
                },
            )
            .map(|x| x.score)
        });

        let catkl_score = monitored.get(a).and_then(|w| {
            monitor::catkl_score_between_windows(
                w.baseline(),
                w.recent(),
                catkl_alpha,
                drift_cfg.tol,
                cfg.catkl_min_baseline,
                cfg.catkl_min_recent,
            )
        });

        let cusum_score = monitored.get(a).and_then(|w| {
            monitor::cusum_score_between_windows(
                w.baseline(),
                w.recent(),
                cusum_alpha,
                drift_cfg.tol,
                cfg.cusum_min_baseline,
                cfg.cusum_min_recent,
                Some(cusum_alt_p),
            )
        });

        let ucb = base.exploration_c * ((total_calls.ln() / n).sqrt());

        // Build base objective values with Wilson-bounded overrides.
        let mut obj_values: Vec<ObjectiveValue> = base
            .objectives
            .iter()
            .map(|obj| {
                // Override rate-based extractors with Wilson-bounded values.
                let value = match obj.extract {
                    Extract::OkRateUcb => ok_rate_used + ucb,
                    Extract::HardJunkRate => hard_used,
                    Extract::SoftJunkRate => soft_used,
                    _ => obj.resolve(&s, ucb),
                };
                let pv = match obj.direction {
                    Direction::Maximize => value,
                    Direction::Minimize => -value,
                };
                let sc = match obj.direction {
                    Direction::Maximize => obj.weight * value,
                    Direction::Minimize => -(obj.weight * value),
                };
                ObjectiveValue {
                    extract: obj.extract,
                    value,
                    pareto_value: pv,
                    scalar_contribution: sc,
                }
            })
            .collect();

        // Append monitoring objectives with pre-computed values.
        let mon_values = [
            drift_score.unwrap_or(0.0),
            catkl_score.unwrap_or(0.0),
            cusum_score.unwrap_or(0.0),
        ];
        for (mon_obj, &mon_val) in monitoring_objectives.iter().zip(mon_values.iter()) {
            obj_values.push(ObjectiveValue {
                extract: mon_obj.extract,
                value: mon_val,
                pareto_value: -mon_val, // minimize
                scalar_contribution: -(mon_obj.weight * mon_val),
            });
        }

        let score: f64 = obj_values.iter().map(|o| o.scalar_contribution).sum();

        candidates.push(CandidateDebug {
            name: a.clone(),
            summary: s,
            ucb,
            objective_values: obj_values,
            score,
            drift_score,
            catkl_score,
            cusum_score,
            ok_half_width: Some(ok_half),
            junk_half_width: Some(soft_half),
            hard_junk_half_width: Some(hard_half),
        });
        frontier_names_in_order.push(a.clone());
    }

    let (best_name, frontier_names) = choose_from_frontier(
        &candidates,
        &frontier_names_in_order,
        eligible_after_cusum.first(),
    );

    let sel = Selection {
        chosen: best_name,
        frontier: frontier_names,
        candidates,
        config: base.clone(),
    };

    MabSelectionDecision {
        selection: sel,
        eligible_arms,
        constraints_fallback_used,
        explore_first: false,
        drift_guard,
        catkl_guard,
        cusum_guard,
    }
}

/// Unified decision envelope for deterministic MAB selection.
///
/// This is a convenience wrapper around `select_mab_explain` that returns a `Decision`
/// suitable for consistent logging/replay across policies.
pub fn select_mab_decide(
    arms_in_order: &[String],
    summaries: &BTreeMap<String, Summary>,
    cfg: MabConfig,
) -> Decision {
    let d = select_mab_explain(arms_in_order, summaries, cfg);
    let mut notes = vec![DecisionNote::Constraints {
        eligible_arms: d.eligible_arms.clone(),
        fallback_used: d.constraints_fallback_used,
    }];
    if d.explore_first {
        notes.push(DecisionNote::ExploreFirst);
    } else {
        notes.push(DecisionNote::DeterministicChoice);
    }
    Decision {
        policy: DecisionPolicy::Mab,
        chosen: d.selection.chosen.clone(),
        probs: None,
        notes,
    }
}

/// Unified decision envelope for monitored deterministic MAB selection.
pub fn select_mab_monitored_decide(
    arms_in_order: &[String],
    monitored: &BTreeMap<String, MonitoredWindow>,
    drift_cfg: DriftConfig,
    cfg: MonitoredMabConfig,
) -> Decision {
    let d = select_mab_monitored_explain(arms_in_order, monitored, drift_cfg, cfg);

    let mut notes = vec![DecisionNote::Constraints {
        eligible_arms: d.eligible_arms.clone(),
        fallback_used: d.constraints_fallback_used,
    }];
    if let Some(ref dg) = d.drift_guard {
        notes.push(DecisionNote::DriftGuard {
            eligible_arms: dg.eligible_arms.clone(),
            fallback_used: dg.fallback_used,
            metric: dg.metric,
            max_drift: dg.max_drift,
        });
    }
    if let Some(ref cg) = d.catkl_guard {
        notes.push(DecisionNote::CatKlGuard {
            eligible_arms: cg.eligible_arms.clone(),
            fallback_used: cg.fallback_used,
            max_catkl: cg.max_catkl,
            alpha: cg.alpha,
            min_baseline: cg.min_baseline,
            min_recent: cg.min_recent,
        });
    }
    if let Some(ref ug) = d.cusum_guard {
        notes.push(DecisionNote::CusumGuard {
            eligible_arms: ug.eligible_arms.clone(),
            fallback_used: ug.fallback_used,
            max_cusum: ug.max_cusum,
            alpha: ug.alpha,
            min_baseline: ug.min_baseline,
            min_recent: ug.min_recent,
            alt_p: ug.alt_p,
        });
    }
    if d.explore_first {
        notes.push(DecisionNote::ExploreFirst);
    } else {
        notes.push(DecisionNote::DeterministicChoice);
    }

    // Attach chosen-arm diagnostics (if present).
    let chosen_row = d
        .selection
        .candidates
        .iter()
        .find(|c| c.name == d.selection.chosen);
    if let Some(c) = chosen_row {
        notes.push(DecisionNote::Diagnostics {
            drift_score: c.drift_score,
            catkl_score: c.catkl_score,
            cusum_score: c.cusum_score,
            ok_half_width: c.ok_half_width,
            junk_half_width: c.junk_half_width,
            hard_junk_half_width: c.hard_junk_half_width,
            mean_quality_score: c.summary.mean_quality_score,
        });
    }

    Decision {
        policy: DecisionPolicy::Mab,
        chosen: d.selection.chosen.clone(),
        probs: None,
        notes,
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use proptest::prelude::*;

    fn mk_test_candidate(name: &str, score: f64) -> CandidateDebug {
        CandidateDebug {
            name: name.to_string(),
            summary: Summary::default(),
            ucb: 0.0,
            objective_values: vec![],
            score,
            drift_score: None,
            catkl_score: None,
            cusum_score: None,
            ok_half_width: None,
            junk_half_width: None,
            hard_junk_half_width: None,
        }
    }

    fn mk_test_candidate_with_calls(name: &str, calls: u64, score: f64) -> CandidateDebug {
        CandidateDebug {
            name: name.to_string(),
            summary: Summary {
                calls,
                ..Summary::default()
            },
            ucb: 0.0,
            objective_values: vec![],
            score,
            drift_score: None,
            catkl_score: None,
            cusum_score: None,
            ok_half_width: None,
            junk_half_width: None,
            hard_junk_half_width: None,
        }
    }

    fn s(
        calls: u64,
        ok: u64,
        junk: u64,
        hard_junk: u64,
        cost_units: u64,
        elapsed_ms_sum: u64,
    ) -> Summary {
        Summary {
            calls,
            ok,
            junk,
            hard_junk,
            cost_units,
            elapsed_ms_sum,
            mean_quality_score: None,
        }
    }

    #[test]
    fn select_mab_is_deterministic_and_prefers_lower_junk_all_else_equal() {
        let arms = vec!["a".to_string(), "b".to_string()];
        let mut m = BTreeMap::new();
        // Same ok/cost/lat, but different junk.
        m.insert("a".to_string(), s(10, 9, 5, 0, 10, 1000));
        m.insert("b".to_string(), s(10, 9, 0, 0, 10, 1000));

        let sel1 = select_mab(&arms, &m, MabConfig::default());
        let sel2 = select_mab(&arms, &m, MabConfig::default());
        assert_eq!(sel1.chosen, "b");
        assert_eq!(sel1.chosen, sel2.chosen);
    }

    #[test]
    fn constraints_filter_arms_but_never_return_empty() {
        let arms = vec!["a".to_string(), "b".to_string()];
        let mut m = BTreeMap::new();
        m.insert("a".to_string(), s(10, 9, 9, 0, 10, 1000));
        m.insert("b".to_string(), s(10, 9, 9, 0, 10, 1000));

        let cfg = MabConfig {
            max_junk_rate: Some(0.1),
            ..MabConfig::default()
        };
        let sel = select_mab(&arms, &m, cfg);
        assert!(!sel.chosen.is_empty());
        assert!(sel.frontier.iter().any(|x| x == &sel.chosen));
    }

    #[test]
    fn constraints_can_exclude_high_hard_junk_arm() {
        let arms = vec!["a".to_string(), "b".to_string()];
        let mut m = BTreeMap::new();
        m.insert("a".to_string(), s(10, 9, 1, 1, 10, 1000));
        m.insert("b".to_string(), s(10, 9, 1, 0, 10, 1000));

        let cfg = MabConfig {
            max_hard_junk_rate: Some(0.05),
            ..MabConfig::default()
        };
        let sel = select_mab(&arms, &m, cfg);
        assert_eq!(sel.chosen, "b");
    }

    proptest! {
        #[test]
        fn select_mab_never_panics_and_returns_member_of_arms(
            // Keep this intentionally small/bounded to avoid slow tests.
            calls_a in 0u64..50,
            calls_b in 0u64..50,
            ok_a in 0u64..50,
            ok_b in 0u64..50,
            junk_a in 0u64..50,
            junk_b in 0u64..50,
            hard_a in 0u64..50,
            hard_b in 0u64..50,
            cost_a in 0u64..500,
            cost_b in 0u64..500,
            lat_a in 0u64..50_000,
            lat_b in 0u64..50_000,
        ) {
            let arms = vec!["a".to_string(), "b".to_string()];
            let mut m = BTreeMap::new();

            // Sanitize counts so they never exceed calls.
            let sa = s(
                calls_a,
                ok_a.min(calls_a),
                junk_a.min(calls_a),
                hard_a.min(junk_a.min(calls_a)),
                cost_a,
                lat_a,
            );
            let sb = s(
                calls_b,
                ok_b.min(calls_b),
                junk_b.min(calls_b),
                hard_b.min(junk_b.min(calls_b)),
                cost_b,
                lat_b,
            );
            m.insert("a".to_string(), sa);
            m.insert("b".to_string(), sb);

            let cfg = MabConfig {
                exploration_c: 0.7,
                ..MabConfig::default()
            };

            let sel = select_mab(&arms, &m, cfg.clone());
            prop_assert!(sel.chosen == "a" || sel.chosen == "b");
            prop_assert!(sel.frontier.iter().any(|x| x == &sel.chosen));

            // Determinism: same input -> same output.
            let sel2 = select_mab(&arms, &m, cfg.clone());
            prop_assert_eq!(sel.chosen, sel2.chosen);
        }

        #[test]
        fn select_mab_ignores_summaries_for_unknown_arms(
            calls_a in 1u64..50,
            calls_b in 1u64..50,
            ok_a in 0u64..50,
            ok_b in 0u64..50,
            junk_a in 0u64..50,
            junk_b in 0u64..50,
            hard_a in 0u64..50,
            hard_b in 0u64..50,
            cost_a in 0u64..500,
            cost_b in 0u64..500,
            lat_a in 0u64..50_000,
            lat_b in 0u64..50_000,
            extra_calls in 0u64..50,
            extra_ok in 0u64..50,
            extra_junk in 0u64..50,
            extra_hard in 0u64..50,
            extra_cost in 0u64..500,
            extra_lat in 0u64..50_000,
        ) {
            let arms = vec!["a".to_string(), "b".to_string()];
            let mut m = BTreeMap::new();

            let sa = s(
                calls_a,
                ok_a.min(calls_a),
                junk_a.min(calls_a),
                hard_a.min(junk_a.min(calls_a)),
                cost_a,
                lat_a,
            );
            let sb = s(
                calls_b,
                ok_b.min(calls_b),
                junk_b.min(calls_b),
                hard_b.min(junk_b.min(calls_b)),
                cost_b,
                lat_b,
            );
            m.insert("a".to_string(), sa);
            m.insert("b".to_string(), sb);

            let cfg = MabConfig::default();
            let sel1 = select_mab(&arms, &m, cfg.clone());

            // Add an irrelevant arm to the summaries map.
            let sx = s(
                extra_calls,
                extra_ok.min(extra_calls),
                extra_junk.min(extra_calls),
                extra_hard.min(extra_junk.min(extra_calls)),
                extra_cost,
                extra_lat,
            );
            m.insert("zzz-extra".to_string(), sx);

            let sel2 = select_mab(&arms, &m, cfg);
            prop_assert_eq!(sel1.chosen, sel2.chosen);
        }

        #[test]
        fn select_mab_explores_first_zero_call_arm(
            // Ensure we have at least one unexplored arm.
            calls_a in 0u64..10,
            calls_b in 0u64..10,
            calls_c in 0u64..10,
        ) {
            let arms = vec!["a".to_string(), "b".to_string(), "c".to_string()];
            let mut m = BTreeMap::new();
            m.insert("a".to_string(), s(calls_a, 0, 0, 0, 0, 0));
            m.insert("b".to_string(), s(calls_b, 0, 0, 0, 0, 0));
            m.insert("c".to_string(), s(calls_c, 0, 0, 0, 0, 0));

            // Find first index with calls == 0.
            let expected = if calls_a == 0 {
                "a"
            } else if calls_b == 0 {
                "b"
            } else if calls_c == 0 {
                "c"
            } else {
                // No zero-call arms -> skip (property vacuously holds).
                return Ok(());
            };

            let sel = select_mab(&arms, &m, MabConfig::default());
            prop_assert_eq!(sel.chosen, expected);
        }
    }

    #[test]
    fn sticky_mab_respects_min_dwell() {
        let arms = vec!["a".to_string(), "b".to_string()];
        let cfg = MabConfig::default();

        let mut sticky = StickyMab::new(StickyConfig {
            min_dwell: 3,
            min_switch_margin: 0.0,
        });

        // First: pick "a".
        let mut m1 = BTreeMap::new();
        m1.insert("a".to_string(), s(10, 10, 0, 0, 0, 0));
        m1.insert("b".to_string(), s(10, 5, 0, 0, 0, 0));
        let e1 = sticky.apply_mab(select_mab_explain(&arms, &m1, cfg.clone()));
        assert_eq!(e1.chosen, "a");
        assert_eq!(sticky.dwell(), 1);

        // Now "b" is better, but dwell gate should keep "a" for 2 more decisions.
        let mut m2 = BTreeMap::new();
        m2.insert("a".to_string(), s(10, 5, 0, 0, 0, 0));
        m2.insert("b".to_string(), s(10, 10, 0, 0, 0, 0));

        let e2 = sticky.apply_mab(select_mab_explain(&arms, &m2, cfg.clone()));
        assert_eq!(e2.chosen, "a");
        let e3 = sticky.apply_mab(select_mab_explain(&arms, &m2, cfg.clone()));
        assert_eq!(e3.chosen, "a");

        // Next decision: allowed to switch.
        let e4 = sticky.apply_mab(select_mab_explain(&arms, &m2, cfg));
        assert_eq!(e4.chosen, "b");
        assert_eq!(sticky.dwell(), 1);
    }

    #[test]
    fn sticky_mab_respects_min_switch_margin() {
        // Construct a base Selection directly so we can control scalar scores precisely.
        let cfg = MabConfig::default();
        let mut sticky = StickyMab::new(StickyConfig {
            min_dwell: 0,
            min_switch_margin: 0.5,
        });

        let mk = |chosen: &str, a_score: f64, b_score: f64| -> MabSelectionDecision {
            MabSelectionDecision {
                selection: Selection {
                    chosen: chosen.to_string(),
                    frontier: vec!["a".to_string(), "b".to_string()],
                    candidates: vec![
                        mk_test_candidate_with_calls("a", 10, a_score),
                        mk_test_candidate_with_calls("b", 10, b_score),
                    ],
                    config: cfg.clone(),
                },
                eligible_arms: vec!["a".to_string(), "b".to_string()],
                constraints_fallback_used: false,
                explore_first: false,
                drift_guard: None,
                catkl_guard: None,
                cusum_guard: None,
            }
        };

        // Start on "a".
        let e1 = sticky.apply_mab(mk("a", 1.0, 1.0));
        assert_eq!(e1.chosen, "a");
        assert_eq!(sticky.previous(), Some("a"));

        // Candidate "b" is only slightly better: margin < 0.5 => keep "a".
        let e2 = sticky.apply_mab(mk("b", 1.0, 1.4));
        assert_eq!(e2.chosen, "a");

        // Candidate "b" is much better: margin >= 0.5 => switch to "b".
        let e3 = sticky.apply_mab(mk("b", 1.0, 1.7));
        assert_eq!(e3.chosen, "b");
        assert_eq!(sticky.previous(), Some("b"));
    }

    #[test]
    fn sticky_mab_follows_base_choice_if_previous_missing_from_candidates() {
        let cfg = MabConfig::default();
        let mut sticky = StickyMab::new(StickyConfig {
            min_dwell: 10,
            min_switch_margin: 100.0,
        });

        // Set a previous arm that won't appear.
        sticky.apply_mab(MabSelectionDecision {
            selection: Selection {
                chosen: "old".to_string(),
                frontier: vec!["old".to_string()],
                candidates: vec![mk_test_candidate("old", 0.0)],
                config: cfg.clone(),
            },
            eligible_arms: vec!["old".to_string()],
            constraints_fallback_used: false,
            explore_first: true,
            drift_guard: None,
            catkl_guard: None,
            cusum_guard: None,
        });
        assert_eq!(sticky.previous(), Some("old"));

        // Now candidates don't include "old" => stickiness must not force an unavailable arm.
        let base = Selection {
            chosen: "a".to_string(),
            frontier: vec!["a".to_string()],
            candidates: vec![mk_test_candidate_with_calls("a", 10, 0.0)],
            config: cfg,
        };
        let e = sticky.apply_mab(MabSelectionDecision {
            selection: base,
            eligible_arms: vec!["a".to_string()],
            constraints_fallback_used: false,
            explore_first: false,
            drift_guard: None,
            catkl_guard: None,
            cusum_guard: None,
        });
        assert_eq!(e.chosen, "a");
        assert_eq!(sticky.previous(), Some("a"));
    }

    #[test]
    fn select_mab_chosen_satisfies_constraints_when_eligible_exists() {
        // If at least one arm passes constraints, we should never return a violating arm.
        let arms = vec!["a".to_string(), "b".to_string()];
        let mut m = BTreeMap::new();

        // Arm "a" violates junk constraint, arm "b" is fine.
        m.insert("a".to_string(), s(100, 90, 80, 0, 10, 1000));
        m.insert("b".to_string(), s(100, 90, 0, 0, 10, 1000));

        let cfg = MabConfig {
            max_junk_rate: Some(0.1),
            ..MabConfig::default()
        };

        let sel = select_mab(&arms, &m, cfg);
        assert_eq!(sel.chosen, "b");

        // Sanity: chosen meets constraints.
        let s = m.get(&sel.chosen).copied().unwrap_or_default();
        assert!(s.junk_rate() <= 0.1);
    }
}