muxer 0.1.3 - Docs.rs

# muxer

Deterministic, multi-objective routing primitives for “provider selection” problems.

## What problem this solves

You have a small set of arms (providers/models/backends) and repeated calls that produce outcomes (success/429/junk), plus cost + latency. You want an **online policy** that:

- **explores** new or recently-changed arms
- **avoids regressions** (junk/429 spikes)
- is **deterministic by default** (same stats/config → same choice), so it’s easy to debug

## What it is

The core idea is:

- maintain a small sliding window of recent outcomes per provider (ok/429/junk, cost, latency)
- compute a Pareto frontier over the objectives
- pick a single provider deterministically via scalarization + stable tie-break

This crate also includes:

- a **seedable Thompson-sampling** policy (`ThompsonSampling`) for cases where you can provide a scalar reward in `[0, 1]` per call
- a **seedable EXP3-IX** policy (`Exp3Ix`) for more adversarial / fast-shifting reward settings
- (feature `contextual`) a **linear contextual bandit** policy (`LinUcb`) for per-request routing with feature vectors

## Which policy should I use?

- **`select_mab` (Window + Pareto + scalarization)**: when you care about **multiple objectives** at once (success, 429, junk, cost, latency) and you want deterministic selection with hard constraints.
- **`ThompsonSampling`**: when you can provide a **single reward** per call (in `[0, 1]`) and want a classic explore/exploit policy (seedable, optionally decayed).
- **`Exp3Ix`**: when reward is **non-stationary / adversarial-ish** and you still want a probabilistic policy (seedable, optionally decayed).
- **`LinUcb` (feature `contextual`)**: when you have a per-request feature vector (e.g. cheap “difficulty” features, embeddings, metadata) and want a contextual policy.

## Unified decision records (recommended for logging/replay)

Most production routers want a single “decision object” shape regardless of policy so logging, auditing, and replay don’t depend on per-policy conventions. `muxer` provides a unified `Decision` envelope with:

- `chosen`: the arm name
- `probs`: optional probability distribution (when a policy has one)
- `notes`: typed audit notes (explore-first, constraint gating, numerical fallback, etc.)

Each policy has a `*_decide` / `decide_*` method that returns this.

## Quick examples

### Deterministic multi-objective selection (Pareto + scalarization)

```rust
use muxer::{select_mab, MabConfig, Summary};
use std::collections::BTreeMap;

let arms = vec!["a".to_string(), "b".to_string()];
let mut summaries = BTreeMap::new();
summaries.insert("a".to_string(), Summary { calls: 10, ok: 9, junk: 0, hard_junk: 0, cost_units: 10, elapsed_ms_sum: 900 });
summaries.insert("b".to_string(), Summary { calls: 10, ok: 9, junk: 2, hard_junk: 0, cost_units: 10, elapsed_ms_sum: 900 });

let sel = select_mab(&arms, &summaries, MabConfig::default());
assert_eq!(sel.chosen, "a"); // lower junk when all else is equal
```

### Realistic “online routing loop” (Window ingestion)

This is closer to production usage: you maintain a `Window` per arm, push `Outcome`s as requests finish, and call `select_mab` each decision.

```bash
cargo run --example deterministic_router
```

Note: this example simulates an environment and therefore requires `--features stochastic` if you disabled default features.

### Monitored selection (baseline vs recent drift + uncertainty-aware rates)

If you maintain a baseline and recent window per arm for change monitoring, use `MonitoredWindow`
plus `select_mab_monitored_*`:

```bash
cargo run --example monitored_router --features stochastic
```

### End-to-end router demo (Window + constraints + stickiness + delayed junk)

This combines multiple production patterns in one loop: window ingestion, constraints+weights, stickiness reasons, and delayed junk labeling.

```bash
cargo run --example end_to_end_router
```

Note: this example simulates an environment and therefore requires `--features stochastic` if you disabled default features.

This same scenario has a CI-checked regression test in `tests/e2e_metrics.rs` and now logs whether constraint fallback was used.

### Window ingestion with delayed junk labeling

If your “junk” classification is only known after downstream parsing/validation, you can update the most recent outcome:

```bash
cargo run --example window_delayed_junk_label
```

### Constraint + trade-off tuning for `select_mab`

Example showing “constraints first, then weights”:

```bash
cargo run --example mab_constraints_tuning
```

### EXP3-IX (adversarial bandit) with probabilities

```rust
use muxer::{Exp3Ix, Exp3IxConfig};

let arms = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let mut ex = Exp3Ix::new(Exp3IxConfig { seed: 123, decay: 0.98, ..Exp3IxConfig::default() });

let d = ex.decide(&arms).unwrap();
// ... run request with `d.chosen` ...
ex.update_reward(&d.chosen, 0.7); // reward in [0, 1]

let probs = d.probs.unwrap();
let s: f64 = probs.values().sum();
assert!((s - 1.0).abs() < 1e-9);
```

Runnable:

```bash
cargo run --example exp3ix_router
```

Note: this example requires `--features stochastic` if you disabled default features.

### Thompson “traffic splitting” selector (mean-softmax allocation)

```rust
use muxer::{ThompsonConfig, ThompsonSampling};

let arms = vec!["a".to_string(), "b".to_string()];
let mut ts = ThompsonSampling::with_seed(
    ThompsonConfig {
        decay: 0.99,
        ..ThompsonConfig::default()
    },
    0,
);
let d = ts.decide_softmax_mean(&arms, 0.3).unwrap();
ts.update_reward(&d.chosen, 1.0);

let alloc = d.probs.unwrap();
let s: f64 = alloc.values().sum();
assert!((s - 1.0).abs() < 1e-9);
```

Runnable:

```bash
cargo run --example thompson_router
```

Note: this example requires `--features stochastic` if you disabled default features.

### Contextual routing (LinUCB)

Runnable:

```bash
cargo run --example contextual_router --features contextual
```

Notes:

- If you want a probability distribution over arms for this context (e.g. for traffic-splitting or logging approximate propensities), use `LinUcb::probabilities(...)` or `LinUcb::decide_softmax_ucb(...)`.
- Algorithm reference: LinUCB (Chu et al., “Contextual bandits with linear payoff functions”).

Contextual “propensity logging” example:

```bash
cargo run --example contextual_propensity_logging --features contextual
```

### Stickiness / switching-cost control

If you want to reduce “flapping” between arms, wrap deterministic selection with `StickyMab`:

```bash
cargo run --example sticky_mab_router
```

### Mini-experiments (bandits × monitoring × false alarms)

If you want runnable “research probes” that make tradeoffs/failure modes explicit, see:

- `muxer/examples/EXPERIMENTS.md`
- Examples:
  - `cargo run --example guardrail_semantics`
  - `cargo run --example coverage_autotune --features stochastic`
  - `cargo run --example free_lunch_investigation --features stochastic`
  - `cargo run --example detector_inertia --features stochastic`
  - `cargo run --example detector_calibration --features stochastic`
  - `cargo run --example bqcd_sampling --features stochastic`
  - `cargo run --release --example bqcd_calibrated --features stochastic`

Reusable bits extracted from these experiments live in `muxer::monitor`, notably:

- `CusumCatBank`: “GLR-lite” robustification via a small bank of CUSUM alternatives.
- `calibrate_threshold_from_max_scores`: threshold calibration from null max-score samples (supports Wilson-conservative mode).

## Usage

```toml
[dependencies]
muxer = "0.1.2"
```

If you only want the deterministic `Window` + `select_mab*` core (no stochastic bandits), disable default features:

```toml
[dependencies]
muxer = { version = "0.1.2", default-features = false }
```

## Development

```bash
# If you are in a larger Cargo workspace, scope to this package:
cargo test -p muxer

# Microbenches (criterion):
cargo bench -p muxer --bench coverage
cargo bench -p muxer --bench monitor

# (Optional) Match CI checks:
cargo fmt -p muxer --check
cargo clippy -p muxer --all-targets -- -D warnings
```