Struct LinUcb

Source

pub struct LinUcb { /* private fields */ }

Expand description

Seedable linear contextual bandit (LinUCB).

Usage:

call select_with_scores(arms, context) to get a choice + debug scores
call update_reward(chosen_arm, context, reward01) after observing reward

Implementations§

Source §

impl LinUcb

Source

pub fn new(cfg: LinUcbConfig) -> Self

Create a new LinUCB instance (deterministic by default).

Source

pub fn scores( &mut self, arms_in_order: &[String], context: &[f64], ) -> BTreeMap<String, LinUcbScore>

Return per-arm (ucb, mean, bonus) scores for a given context.

Source

pub fn select_with_scores<'a>( &mut self, arms_in_order: &'a [String], context: &[f64], ) -> Option<(&'a String, BTreeMap<String, LinUcbScore>)>

Select an arm for a given context, returning the chosen arm + per-arm scores.

Policy:

Explore each arm once (stable order) before using scores.
Otherwise choose argmax UCB; tie-break is stable, with seeded randomness as a last resort.

Source

pub fn decide( &mut self, arms_in_order: &[String], context: &[f64], ) -> Option<Decision>

Select (argmax UCB) and return a unified Decision.

Notes:

Does not include probs (this method is deterministic argmax).
Records explore-first vs deterministic choice.

Explore-first detection: select_with_scores already returns an unseen arm (uses == 0) in stable order before using UCB scores. We re-read the state here to tag the decision correctly; this is intentionally a read-after-select (not a second decision), so the tag always agrees with the choice.

Source

pub fn probabilities( &mut self, arms_in_order: &[String], context: &[f64], temperature: f64, ) -> BTreeMap<String, f64>

Softmax distribution over arms based on their current UCB scores for this context.

This is useful for:

traffic-splitting (probabilistic routing)
logging an approximate propensity distribution for offline evaluation

§Example

use muxer::{LinUcb, LinUcbConfig};

let arms = vec!["a".to_string(), "b".to_string()];
let mut p = LinUcb::new(LinUcbConfig { dim: 2, ..LinUcbConfig::default() });
let probs = p.probabilities(&arms, &[0.2, 0.8], 0.3);
let s: f64 = probs.values().sum();
assert!((s - 1.0).abs() < 1e-9);

Source

pub fn select_softmax_ucb_with_probs<'a>( &mut self, arms_in_order: &'a [String], context: &[f64], temperature: f64, ) -> Option<(&'a String, BTreeMap<String, f64>)>

Select an arm by sampling from probabilities(...), returning the chosen arm and the probabilities used.

Policy:

Explore each arm once in stable order (still returns a full probs map).
Otherwise sample from a softmax over UCB scores (seeded RNG).

§Example

use muxer::{LinUcb, LinUcbConfig};

let arms = vec!["a".to_string(), "b".to_string()];
let mut p = LinUcb::new(LinUcbConfig { dim: 2, seed: 0, ..LinUcbConfig::default() });
let (chosen, probs) = p.select_softmax_ucb_with_probs(&arms, &[0.2, 0.8], 0.3).unwrap();
p.update_reward(chosen, &[0.2, 0.8], 1.0);
let s: f64 = probs.values().sum();
assert!((s - 1.0).abs() < 1e-9);

Source

pub fn decide_softmax_ucb( &mut self, arms_in_order: &[String], context: &[f64], temperature: f64, ) -> Option<Decision>

Select via softmax over UCB scores and return a unified Decision.

Notes:

Always includes probs (the softmax allocation over UCB for this context).
Records explore-first vs sampling and numerical fallback.

Source

pub fn update_reward(&mut self, arm: &str, context: &[f64], reward01: f64)

Update the model for arm given the same context used for selection and a reward in [0, 1].

Source

pub fn theta_vectors( &mut self, arms_in_order: &[String], ) -> BTreeMap<String, Vec<f64>>

Return per-arm theta vectors (A_inv @ b) for sensitivity analysis.

Each arm’s theta is its learned response function: E[reward | context x] = theta^T x. The matrix of theta vectors (arms x dim) can be fed to pare::sensitivity::analyze_redundancy to compute the gradient rank and identify which arms have genuinely different context-dependent behavior.

Source

pub fn snapshot(&self) -> LinUcbState

Capture a persistence snapshot of the current LinUCB state.

This includes per-arm sufficient statistics (A_inv, b, uses) so that a caller can serialize, store, and later restore the policy state across process restarts.

Source

pub fn restore(&mut self, st: LinUcbState)

Restore a previously snapshotted LinUCB state.

Arms not present in the snapshot are initialized fresh. Arms in the snapshot but not in the current arm set are ignored. Dimension mismatches cause affected arms to be re-initialized.