pub struct Exp3Ix { /* private fields */ }Expand description
Seedable EXP3-IX bandit.
Implementations§
Source§impl Exp3Ix
impl Exp3Ix
Sourcepub fn new(cfg: Exp3IxConfig) -> Self
pub fn new(cfg: Exp3IxConfig) -> Self
Create a new EXP3-IX instance with deterministic defaults.
Sourcepub fn with_seed(cfg: Exp3IxConfig, seed: u64) -> Self
pub fn with_seed(cfg: Exp3IxConfig, seed: u64) -> Self
Create with an explicit seed.
Sourcepub fn probabilities(
&mut self,
arms_in_order: &[String],
) -> BTreeMap<String, f64>
pub fn probabilities( &mut self, arms_in_order: &[String], ) -> BTreeMap<String, f64>
Current selection probabilities (aligned to arms_in_order).
Sourcepub fn effective_sample_size(&self) -> f64
pub fn effective_sample_size(&self) -> f64
Effective sample size (Kish’s ESS) of the current probability distribution.
ESS = 1 / sum(p_i^2), bounded in [1, K]. When ESS approaches 1, one arm
dominates the policy and reward estimates for other arms are unreliable
(high importance-weight variance). When ESS equals K, the policy is uniform.
This is the primary uncertainty diagnostic for adversarial bandits. EXP3-IX makes no distributional assumptions, so Bayesian posteriors don’t apply. ESS measures how much effective information the importance-weighted estimator has, which is the right notion of uncertainty for this policy class.
Reference: Kish (1965), “Survey Sampling”; applied to bandit IPW by Waudby-Smith et al (2022, arXiv:2210.10768).
Sourcepub fn weight_entropy(&self) -> f64
pub fn weight_entropy(&self) -> f64
Shannon entropy of the current probability distribution (nats).
H(p) = -sum(p_i * ln(p_i)), bounded in [0, ln(K)]. Higher entropy means
the policy is more uncertain (closer to uniform). Low entropy means the policy
has converged toward one or a few arms.
Useful as a convergence monitor: entropy that plateaus below ln(K) indicates
the policy has learned a preference; entropy near ln(K) after many rounds
suggests the arms are indistinguishable or rewards are too noisy.
Sourcepub fn effective_arms(&self) -> f64
pub fn effective_arms(&self) -> f64
Effective number of arms: exp(entropy), bounded in [1, K].
A single scalar summarizing how “decided” the policy is:
- Near 1.0: effectively committed to one arm.
- Near K: effectively uniform (maximum uncertainty).
This is the exponential of Shannon entropy (the “perplexity” of the distribution).
Sourcepub fn snapshot(&self) -> Exp3IxState
pub fn snapshot(&self) -> Exp3IxState
Capture a persistence snapshot of the current EXP3-IX state.
Callers should prefer calling this after probabilities(...) or decide(...) so
arms is initialized and probs is up to date.
Sourcepub fn restore(&mut self, st: Exp3IxState)
pub fn restore(&mut self, st: Exp3IxState)
Restore a previously snapshotted EXP3-IX state.
If the stored state is inconsistent (length mismatches), this resets to a fresh state.
Sourcepub fn decide_deterministic_filtered(
&mut self,
arms_in_order: &[String],
eligible_in_order: &[String],
decision_seed: u64,
) -> Option<Decision>
pub fn decide_deterministic_filtered( &mut self, arms_in_order: &[String], eligible_in_order: &[String], decision_seed: u64, ) -> Option<Decision>
Deterministic decision from a filtered eligible set.
This is designed for callers that:
- keep persistent EXP3-IX state across process runs
- apply external hard constraints (e.g. latency guardrail) that shrink the eligible set
- want a deterministic decision given a seed, without persisting RNG state
The returned Decision.probs is over eligible_in_order (renormalized).
If you update using this decision, prefer update_reward_with_prob(...) with
prob_used := decision.probs[chosen].
Sourcepub fn update_reward_with_prob(
&mut self,
arm: &str,
reward01: f64,
prob_used: f64,
)
pub fn update_reward_with_prob( &mut self, arm: &str, reward01: f64, prob_used: f64, )
Update EXP3-IX with a bounded reward in [0, 1], using an explicit probability.
This is useful when the decision was made from a filtered/renormalized distribution (e.g. a latency guardrail) and you want the importance weighting to use the exact probability mass function that was actually sampled.
Sourcepub fn select_with_probs<'a>(
&mut self,
arms_in_order: &'a [String],
) -> Option<(&'a String, BTreeMap<String, f64>)>
pub fn select_with_probs<'a>( &mut self, arms_in_order: &'a [String], ) -> Option<(&'a String, BTreeMap<String, f64>)>
Select an arm and return the probabilities used for selection.
§Example
use muxer::{Exp3Ix, Exp3IxConfig};
let arms = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let mut ex = Exp3Ix::new(Exp3IxConfig { seed: 123, decay: 0.98, ..Exp3IxConfig::default() });
let (chosen, probs) = ex.select_with_probs(&arms).unwrap();
ex.update_reward(chosen, 0.7);
let s: f64 = probs.values().sum();
assert!((s - 1.0).abs() < 1e-9);Sourcepub fn select<'a>(&mut self, arms_in_order: &'a [String]) -> Option<&'a String>
pub fn select<'a>(&mut self, arms_in_order: &'a [String]) -> Option<&'a String>
Select an arm.
Policy:
- Explore each arm once in stable order.
- Otherwise sample from the current EXP3-IX distribution (seeded RNG).
Sourcepub fn decide(&mut self, arms_in_order: &[String]) -> Option<Decision>
pub fn decide(&mut self, arms_in_order: &[String]) -> Option<Decision>
Select an arm and return a unified Decision (recommended for logging/replay).
Notes:
- Always includes a
probsdistribution over arms as of this decision. - Records whether explore-first occurred and whether numerical fallback was used.
Sourcepub fn update_reward(&mut self, arm: &str, reward01: f64)
pub fn update_reward(&mut self, arm: &str, reward01: f64)
Update EXP3-IX with a bounded reward in [0, 1].
Trait Implementations§
Source§impl BanditPolicy for Exp3Ix
Available on crate feature stochastic only.
impl BanditPolicy for Exp3Ix
stochastic only.