Expand description
§mctrust — Production-grade Monte Carlo Tree Search
A generic, zero-domain-assumption MCTS engine for any sequential decision-making problem: game AI, portfolio optimization, security scan planning, hyperparameter tuning, molecular design, compiler optimization, robot path planning, or anything with a state space to explore.
§Design
This crate provides two complementary search interfaces:
-
Environment-based (
TreeSearch) — Full tree-search MCTS. You define anEnvironmentwith actions, state transitions, and terminal conditions. The engine handles selection, expansion, simulation, and backpropagation. Use for: optimization, planning, game AI, theorem proving, scheduling. -
Bandit-based (
BanditSearch) — Two-level MCTS for explore/exploit problems where you have a flat set of options grouped by category. The engine uses UCT+RAVE to decide which option to try next, and you feed back rewards. Use for: fuzzing, A/B testing, hyperparameter search, scan strategy optimization, any problem with pre-enumerated choices.
§Feature Flags
| Feature | Default | Description |
|---|---|---|
std | ✅ | Enables rand/std and StdRng — disable for #![no_std] targets |
parallel | ✅ | Root-parallel MCTS via Rayon — linear scaling across CPU cores |
dag | ✅ | Transposition Table (DAG) via hashbrown — compresses identical states |
toml | ✅ | Enables TOML config parsing (from_toml_str, from_toml_file) |
§Quick Start (Environment-based)
use mctrust::{Environment, Outcome, TreeSearch, SearchConfig, Reward};
#[derive(Clone)]
struct Optimizer { value: f64 }
#[derive(Clone, Debug, PartialEq)]
enum Action { Increase, Decrease }
impl Environment for Optimizer {
type Action = Action;
fn legal_actions(&self) -> Vec<Action> {
vec![Action::Increase, Action::Decrease]
}
fn apply(&mut self, action: &Action) {
match action {
Action::Increase => self.value += 0.1,
Action::Decrease => self.value -= 0.1,
}
}
fn evaluate(&self) -> Outcome {
if (self.value - 1.0).abs() < 0.01 {
Outcome::Terminal(Reward::WIN)
} else if self.value < -5.0 || self.value > 5.0 {
Outcome::Terminal(Reward::LOSS)
} else {
Outcome::Ongoing
}
}
}
let state = Optimizer { value: 0.0 };
let config = SearchConfig::default();
let mut search = TreeSearch::new(state, config);
let best_action = search.run();§Quick Start (Bandit-based)
use mctrust::{BanditSearch, BanditConfig};
let mut search = BanditSearch::new(BanditConfig::default());
// Register 100 options across 5 categories.
for category in 0..5u64 {
for option in 0..20u64 {
search.add_arm(category * 20 + option, category as u32);
}
}
// Explore and feed rewards.
while let Some(option_id) = search.next_arm() {
let reward = 0.5; // your evaluation
search.observe(option_id, reward);
}§Capabilities
- UCT selection with configurable exploration constant
- PUCT (AlphaZero-style) with pluggable action priors
- Gumbel
MuZero— hyperparameter-free Sequential Halving - Thompson Sampling for stochastic exploration
- RAVE for rapid cross-branch value estimation
- Pluggable
Evaluatortrait — replace random rollout with neural networks or domain heuristics - Multi-agent support — negamax-style backpropagation for adversarial environments
- Tree reuse — re-root the tree after each decision, preserving subtree statistics
- DAG transposition tables — merge identical states, compress the tree
- Root parallelism via Rayon — lock-free linear scaling across cores
- Node budget — bound memory usage with
with_max_nodes() run_until(predicate)— arbitrary stop conditions (reward threshold, convergence, cancellation)- Principal variation extraction — follow the best line through the tree (actions + states)
- Graphviz DOT export — visualize the search tree at any depth
- Checkpoint/restore — pause and resume searches across processes
- Progressive widening for continuous or massive action spaces
- Stepped iteration —
run_step()for fine-grained control, async interleaving - Time and iteration budgets — wall-clock deadline support for real-time systems
- Deterministic RNG via
ChaCha8Rng - Zero domain assumptions — works for any sequential decision problem
§References
- Kocsis & Szepesvári, “Bandit-based Monte Carlo Planning” (2006)
- Gelly & Silver, “Monte-Carlo tree search and rapid action value estimation in computer Go” (2011)
- Browne et al., “A Survey of Monte Carlo Tree Search Methods” (2012)
- Danihelka et al., “Policy Improvement by Planning with Gumbel” (2022)
Structs§
- Bandit
Config - Configuration for
crate::BanditSearch. - Bandit
Search - Arms are grouped by category. The engine maintains a two-level tree: root → group nodes → arm selection within groups.
- Bandit
Search Checkpoint - Serializable checkpoint for restoring bandit search progress.
- Group
Stats - Read-only statistics for a group in
crate::BanditSearch. - Heuristic
- Optional heuristic signals for a state.
- Node
Stats - Statistics exposed for diagnostics and visualization.
- Progressive
Widening Config - Progressive widening for large or continuous action spaces.
- Rave
Config - Configuration for AMAF/RAVE value blending.
- Reward
- Quantitative reward for an MCTS state or simulation outcome.
- Scalarizer
- Scalarizer for named signals supplied at observation time.
- Search
Config - Configuration for
crate::TreeSearch. - Tree
Search - Game-tree MCTS engine.
- Tree
Search Checkpoint - Serialisable game-search checkpoint used for mid-search persistence.
Enums§
- Outcome
- Terminal or non-terminal state of an environment.
- Search
Config Load Error - Error returned when loading config from disk.
- Tree
Policy - Tree policy used during child selection.
Traits§
- Environment
- A domain environment that the MCTS engine can explore.
- Evaluator
- Pluggable leaf evaluator for replacing random rollouts with domain-specific or neural network-based evaluation.