Skip to main content

Exp3Ix

Struct Exp3Ix 

Source
pub struct Exp3Ix { /* private fields */ }
Expand description

Seedable EXP3-IX bandit.

Implementations§

Source§

impl Exp3Ix

Source

pub fn new(cfg: Exp3IxConfig) -> Self

Create a new EXP3-IX instance with deterministic defaults.

Source

pub fn with_seed(cfg: Exp3IxConfig, seed: u64) -> Self

Create with an explicit seed.

Source

pub fn probabilities( &mut self, arms_in_order: &[String], ) -> BTreeMap<String, f64>

Current selection probabilities (aligned to arms_in_order).

Source

pub fn effective_sample_size(&self) -> f64

Effective sample size (Kish’s ESS) of the current probability distribution.

ESS = 1 / sum(p_i^2), bounded in [1, K]. When ESS approaches 1, one arm dominates the policy and reward estimates for other arms are unreliable (high importance-weight variance). When ESS equals K, the policy is uniform.

This is the primary uncertainty diagnostic for adversarial bandits. EXP3-IX makes no distributional assumptions, so Bayesian posteriors don’t apply. ESS measures how much effective information the importance-weighted estimator has, which is the right notion of uncertainty for this policy class.

Reference: Kish (1965), “Survey Sampling”; applied to bandit IPW by Waudby-Smith et al (2022, arXiv:2210.10768).

Source

pub fn weight_entropy(&self) -> f64

Shannon entropy of the current probability distribution (nats).

H(p) = -sum(p_i * ln(p_i)), bounded in [0, ln(K)]. Higher entropy means the policy is more uncertain (closer to uniform). Low entropy means the policy has converged toward one or a few arms.

Useful as a convergence monitor: entropy that plateaus below ln(K) indicates the policy has learned a preference; entropy near ln(K) after many rounds suggests the arms are indistinguishable or rewards are too noisy.

Source

pub fn effective_arms(&self) -> f64

Effective number of arms: exp(entropy), bounded in [1, K].

A single scalar summarizing how “decided” the policy is:

  • Near 1.0: effectively committed to one arm.
  • Near K: effectively uniform (maximum uncertainty).

This is the exponential of Shannon entropy (the “perplexity” of the distribution).

Source

pub fn snapshot(&self) -> Exp3IxState

Capture a persistence snapshot of the current EXP3-IX state.

Callers should prefer calling this after probabilities(...) or decide(...) so arms is initialized and probs is up to date.

Source

pub fn restore(&mut self, st: Exp3IxState)

Restore a previously snapshotted EXP3-IX state.

If the stored state is inconsistent (length mismatches), this resets to a fresh state.

Source

pub fn decide_deterministic_filtered( &mut self, arms_in_order: &[String], eligible_in_order: &[String], decision_seed: u64, ) -> Option<Decision>

Deterministic decision from a filtered eligible set.

This is designed for callers that:

  • keep persistent EXP3-IX state across process runs
  • apply external hard constraints (e.g. latency guardrail) that shrink the eligible set
  • want a deterministic decision given a seed, without persisting RNG state

The returned Decision.probs is over eligible_in_order (renormalized). If you update using this decision, prefer update_reward_with_prob(...) with prob_used := decision.probs[chosen].

Source

pub fn update_reward_with_prob( &mut self, arm: &str, reward01: f64, prob_used: f64, )

Update EXP3-IX with a bounded reward in [0, 1], using an explicit probability.

This is useful when the decision was made from a filtered/renormalized distribution (e.g. a latency guardrail) and you want the importance weighting to use the exact probability mass function that was actually sampled.

Source

pub fn select_with_probs<'a>( &mut self, arms_in_order: &'a [String], ) -> Option<(&'a String, BTreeMap<String, f64>)>

Select an arm and return the probabilities used for selection.

§Example
use muxer::{Exp3Ix, Exp3IxConfig};

let arms = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let mut ex = Exp3Ix::new(Exp3IxConfig { seed: 123, decay: 0.98, ..Exp3IxConfig::default() });
let (chosen, probs) = ex.select_with_probs(&arms).unwrap();
ex.update_reward(chosen, 0.7);
let s: f64 = probs.values().sum();
assert!((s - 1.0).abs() < 1e-9);
Source

pub fn select<'a>(&mut self, arms_in_order: &'a [String]) -> Option<&'a String>

Select an arm.

Policy:

  • Explore each arm once in stable order.
  • Otherwise sample from the current EXP3-IX distribution (seeded RNG).
Source

pub fn decide(&mut self, arms_in_order: &[String]) -> Option<Decision>

Select an arm and return a unified Decision (recommended for logging/replay).

Notes:

  • Always includes a probs distribution over arms as of this decision.
  • Records whether explore-first occurred and whether numerical fallback was used.
Source

pub fn update_reward(&mut self, arm: &str, reward01: f64)

Update EXP3-IX with a bounded reward in [0, 1].

Trait Implementations§

Source§

impl BanditPolicy for Exp3Ix

Available on crate feature stochastic only.
Source§

fn decide(&mut self, arms: &[String]) -> Option<Decision>

Select an arm, returning a Decision with the chosen arm and optional probability distribution. Read more
Source§

fn update_reward(&mut self, arm: &str, reward: f64)

Update the policy with a scalar reward for arm. Read more
Source§

impl Clone for Exp3Ix

Source§

fn clone(&self) -> Exp3Ix

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Exp3Ix

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Exp3Ix

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V