Struct Exp3Ix

Source

pub struct Exp3Ix { /* private fields */ }

Expand description

Seedable EXP3-IX bandit.

Implementations§

Source §

impl Exp3Ix

Source

pub fn new(cfg: Exp3IxConfig) -> Self

Create a new EXP3-IX instance with deterministic defaults.

Source

pub fn with_seed(cfg: Exp3IxConfig, seed: u64) -> Self

Create with an explicit seed.

Source

pub fn probabilities( &mut self, arms_in_order: &[String], ) -> BTreeMap<String, f64>

Current selection probabilities (aligned to arms_in_order).

Source

pub fn effective_sample_size(&self) -> f64

Effective sample size (Kish’s ESS) of the current probability distribution.

ESS = 1 / sum(p_i^2), bounded in [1, K]. When ESS approaches 1, one arm dominates the policy and reward estimates for other arms are unreliable (high importance-weight variance). When ESS equals K, the policy is uniform.

This is the primary uncertainty diagnostic for adversarial bandits. EXP3-IX makes no distributional assumptions, so Bayesian posteriors don’t apply. ESS measures how much effective information the importance-weighted estimator has, which is the right notion of uncertainty for this policy class.

Reference: Kish (1965), “Survey Sampling”; applied to bandit IPW by Waudby-Smith et al (2022, arXiv:2210.10768).

Source

pub fn weight_entropy(&self) -> f64

Shannon entropy of the current probability distribution (nats).

H(p) = -sum(p_i * ln(p_i)), bounded in [0, ln(K)]. Higher entropy means the policy is more uncertain (closer to uniform). Low entropy means the policy has converged toward one or a few arms.

Useful as a convergence monitor: entropy that plateaus below ln(K) indicates the policy has learned a preference; entropy near ln(K) after many rounds suggests the arms are indistinguishable or rewards are too noisy.

Source

pub fn effective_arms(&self) -> f64

Effective number of arms: exp(entropy), bounded in [1, K].

A single scalar summarizing how “decided” the policy is:

Near 1.0: effectively committed to one arm.
Near K: effectively uniform (maximum uncertainty).

This is the exponential of Shannon entropy (the “perplexity” of the distribution).

Source

pub fn snapshot(&self) -> Exp3IxState

Capture a persistence snapshot of the current EXP3-IX state.

Callers should prefer calling this after probabilities(...) or decide(...) so arms is initialized and probs is up to date.

Source

pub fn restore(&mut self, st: Exp3IxState)

Restore a previously snapshotted EXP3-IX state.

If the stored state is inconsistent (length mismatches), this resets to a fresh state.

Source

pub fn decide_deterministic_filtered( &mut self, arms_in_order: &[String], eligible_in_order: &[String], decision_seed: u64, ) -> Option<Decision>

Deterministic decision from a filtered eligible set.

This is designed for callers that:

keep persistent EXP3-IX state across process runs
apply external hard constraints (e.g. latency guardrail) that shrink the eligible set
want a deterministic decision given a seed, without persisting RNG state

The returned Decision.probs is over eligible_in_order (renormalized). If you update using this decision, prefer update_reward_with_prob(...) with prob_used := decision.probs[chosen].

Source

pub fn update_reward_with_prob( &mut self, arm: &str, reward01: f64, prob_used: f64, )

Update EXP3-IX with a bounded reward in [0, 1], using an explicit probability.

This is useful when the decision was made from a filtered/renormalized distribution (e.g. a latency guardrail) and you want the importance weighting to use the exact probability mass function that was actually sampled.

Source

pub fn select_with_probs<'a>( &mut self, arms_in_order: &'a [String], ) -> Option<(&'a String, BTreeMap<String, f64>)>

Select an arm and return the probabilities used for selection.

§Example

use muxer::{Exp3Ix, Exp3IxConfig};

let arms = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let mut ex = Exp3Ix::new(Exp3IxConfig { seed: 123, decay: 0.98, ..Exp3IxConfig::default() });
let (chosen, probs) = ex.select_with_probs(&arms).unwrap();
ex.update_reward(chosen, 0.7);
let s: f64 = probs.values().sum();
assert!((s - 1.0).abs() < 1e-9);