🐙 octopus
🎯 Mission Statement
A generic, multi-threaded, and ergonomic Rust crate for Multi-Armed Bandit (MAB) algorithms—designed for engineers who need extensibility, custom reward modeling, and a clear path to contextual bandits.
✅ Goals
- Support key MAB algorithms:
- MVP:
Epsilon-Greedy
,UCB1
,Thompson Sampling
- Future:
LinUCB
, etc.
- MVP:
- Built-in multi-threading for computational bottlenecks (e.g., regret calculation, reward evaluation).
- Clean, extensible abstraction for stateless and contextual bandits, using Rust traits and generics.
- User-defined reward types and flexible action/context typing via trait bounds.
- Target audience: ML engineers & backend engineers running real-time services in Rust.
🚫 Non-Goals
- Not a general-purpose RL framework (e.g., no DQN, A3C, etc.)
- Not a rich visualization tool
🧱 Core Abstractions
// Action: Represents an arm/action in the bandit problem
// Reward: Represents the reward signal
// Context: Represents contextual information (for contextual bandits)
// BanditPolicy: Core trait for all bandit algorithms
// Environment: Simulated environment for running experiments
- Extensibility: Implement these traits for your own types to plug into the framework.
- Action/Reward/Context: Highly generic, must satisfy trait bounds above.
🧪 Implemented Algorithms
epsilon_greedy::EpsilonGreedyPolicy
- Parameters:
epsilon: f64
, initial actions - Tracks average reward and count per action
- Generic over action, reward, and context types
🏗️ Simulation Engine
- The
Simulator
struct orchestrates the interaction between a bandit policy and an environment. - Collects cumulative rewards, regret, and per-step metrics via
SimulationResults
.
Example:
use EpsilonGreedyPolicy;
use Simulator;
use ;
// Define your own Action, Reward, and Environment types implementing the required traits
// ...
let actions = vec!;
let mut policy = new.unwrap;
let environment = /* your environment here */;
let mut simulator = new;
let results = simulator.run;
println!;
🔁 Concurrency Design
- Internal parallelism (e.g., for regret calculation) uses
rayon
and thread-safe primitives (Mutex
). - User-facing API is single-threaded for simplicity; internal operations are parallelized where beneficial.
- No explicit builder pattern or thread-safe wrappers in the current API.
📦 Integration & Ecosystem
- Uses
ndarray
for context features - Uses
rayon
for parallelism - Error handling via
thiserror
- (Planned) Optional
serde
for serialization
🧩 Future Roadmap
- Add more algorithms:
UCB1
,LinUCB
etc. - Benchmark suite comparing with Python implementations
- Async/streaming reward update support
- Optional logging/tracing integration
🧪 Test Strategy
- Unit tests for all algorithms and simulation logic
- Integration tests with simulated reward distributions
- Stress tests for multi-threaded scenarios