pub struct LinUcb { /* private fields */ }Expand description
Seedable linear contextual bandit (LinUCB).
Usage:
- call
select_with_scores(arms, context)to get a choice + debug scores - call
update_reward(chosen_arm, context, reward01)after observing reward
Implementations§
Source§impl LinUcb
impl LinUcb
Sourcepub fn new(cfg: LinUcbConfig) -> Self
pub fn new(cfg: LinUcbConfig) -> Self
Create a new LinUCB instance (deterministic by default).
Sourcepub fn scores(
&mut self,
arms_in_order: &[String],
context: &[f64],
) -> BTreeMap<String, LinUcbScore>
pub fn scores( &mut self, arms_in_order: &[String], context: &[f64], ) -> BTreeMap<String, LinUcbScore>
Return per-arm (ucb, mean, bonus) scores for a given context.
Sourcepub fn select_with_scores<'a>(
&mut self,
arms_in_order: &'a [String],
context: &[f64],
) -> Option<(&'a String, BTreeMap<String, LinUcbScore>)>
pub fn select_with_scores<'a>( &mut self, arms_in_order: &'a [String], context: &[f64], ) -> Option<(&'a String, BTreeMap<String, LinUcbScore>)>
Select an arm for a given context, returning the chosen arm + per-arm scores.
Policy:
- Explore each arm once (stable order) before using scores.
- Otherwise choose argmax UCB; tie-break is stable, with seeded randomness as a last resort.
Sourcepub fn decide(
&mut self,
arms_in_order: &[String],
context: &[f64],
) -> Option<Decision>
pub fn decide( &mut self, arms_in_order: &[String], context: &[f64], ) -> Option<Decision>
Select (argmax UCB) and return a unified Decision.
Notes:
- Does not include
probs(this method is deterministic argmax). - Records explore-first vs deterministic choice.
Explore-first detection: select_with_scores already returns an unseen arm
(uses == 0) in stable order before using UCB scores. We re-read the state
here to tag the decision correctly; this is intentionally a read-after-select
(not a second decision), so the tag always agrees with the choice.
Sourcepub fn probabilities(
&mut self,
arms_in_order: &[String],
context: &[f64],
temperature: f64,
) -> BTreeMap<String, f64>
pub fn probabilities( &mut self, arms_in_order: &[String], context: &[f64], temperature: f64, ) -> BTreeMap<String, f64>
Softmax distribution over arms based on their current UCB scores for this context.
This is useful for:
- traffic-splitting (probabilistic routing)
- logging an approximate propensity distribution for offline evaluation
§Example
use muxer::{LinUcb, LinUcbConfig};
let arms = vec!["a".to_string(), "b".to_string()];
let mut p = LinUcb::new(LinUcbConfig { dim: 2, ..LinUcbConfig::default() });
let probs = p.probabilities(&arms, &[0.2, 0.8], 0.3);
let s: f64 = probs.values().sum();
assert!((s - 1.0).abs() < 1e-9);Sourcepub fn select_softmax_ucb_with_probs<'a>(
&mut self,
arms_in_order: &'a [String],
context: &[f64],
temperature: f64,
) -> Option<(&'a String, BTreeMap<String, f64>)>
pub fn select_softmax_ucb_with_probs<'a>( &mut self, arms_in_order: &'a [String], context: &[f64], temperature: f64, ) -> Option<(&'a String, BTreeMap<String, f64>)>
Select an arm by sampling from probabilities(...), returning the chosen arm and the probabilities used.
Policy:
- Explore each arm once in stable order (still returns a full
probsmap). - Otherwise sample from a softmax over UCB scores (seeded RNG).
§Example
use muxer::{LinUcb, LinUcbConfig};
let arms = vec!["a".to_string(), "b".to_string()];
let mut p = LinUcb::new(LinUcbConfig { dim: 2, seed: 0, ..LinUcbConfig::default() });
let (chosen, probs) = p.select_softmax_ucb_with_probs(&arms, &[0.2, 0.8], 0.3).unwrap();
p.update_reward(chosen, &[0.2, 0.8], 1.0);
let s: f64 = probs.values().sum();
assert!((s - 1.0).abs() < 1e-9);Sourcepub fn decide_softmax_ucb(
&mut self,
arms_in_order: &[String],
context: &[f64],
temperature: f64,
) -> Option<Decision>
pub fn decide_softmax_ucb( &mut self, arms_in_order: &[String], context: &[f64], temperature: f64, ) -> Option<Decision>
Select via softmax over UCB scores and return a unified Decision.
Notes:
- Always includes
probs(the softmax allocation over UCB for this context). - Records explore-first vs sampling and numerical fallback.
Sourcepub fn update_reward(&mut self, arm: &str, context: &[f64], reward01: f64)
pub fn update_reward(&mut self, arm: &str, context: &[f64], reward01: f64)
Update the model for arm given the same context used for selection and a reward in [0, 1].
Sourcepub fn theta_vectors(
&mut self,
arms_in_order: &[String],
) -> BTreeMap<String, Vec<f64>>
pub fn theta_vectors( &mut self, arms_in_order: &[String], ) -> BTreeMap<String, Vec<f64>>
Return per-arm theta vectors (A_inv @ b) for sensitivity analysis.
Each arm’s theta is its learned response function: E[reward | context x] = theta^T x.
The matrix of theta vectors (arms x dim) can be fed to pare::sensitivity::analyze_redundancy
to compute the gradient rank and identify which arms have genuinely different
context-dependent behavior.
Sourcepub fn snapshot(&self) -> LinUcbState
pub fn snapshot(&self) -> LinUcbState
Capture a persistence snapshot of the current LinUCB state.
This includes per-arm sufficient statistics (A_inv, b, uses) so that a caller can serialize, store, and later restore the policy state across process restarts.
Sourcepub fn restore(&mut self, st: LinUcbState)
pub fn restore(&mut self, st: LinUcbState)
Restore a previously snapshotted LinUCB state.
Arms not present in the snapshot are initialized fresh. Arms in the snapshot but not in the current arm set are ignored. Dimension mismatches cause affected arms to be re-initialized.