# mctrust — Technical Spec
## Overview
# mctrust — Production-grade Monte Carlo Tree Search A generic, zero-domain-assumption MCTS engine for any sequential decision-making problem: game AI, portfolio optimization, security scan planning, hyperparameter tuning, molecular design, compiler optimization, robot path planning, or anything with a state space to explore. ## Design This crate provides two complementary search interfaces: 1. **Environment-based** ([`TreeSearch`]) — Full tree-search MCTS. You define an [`Environment`] with actions, state transitions, and terminal conditions. The engine handles selection, expansion, simulation, and backpropagation. Use for: optimization, planning, game AI, theorem proving, scheduling. 2. **Bandit-based** ([`BanditSearch`]) — Two-level MCTS for explore/exploit problems where you have a flat set of options grouped by category. The engine uses UCT+RAVE to decide which option to try next, and you feed back rewards. Use for: fuzzing, A/B testing, hyperparameter search, scan strategy optimization, any problem with pre-enumerated choices. ## Feature Flags | Feature | Default | Description | |------------|---------|-------------| | `std` | ✅ | Enables `rand/std` and `StdRng` — disable for `#![no_std]` targets | | `parallel` | ✅ | Root-parallel MCTS via Rayon — linear scaling across CPU cores | | `dag` | ✅ | Transposition Table (DAG) via `hashbrown` — compresses identical states | | `toml` | ✅ | Enables TOML config parsing (`from_toml_str`, `from_toml_file`) | ## Quick Start (Environment-based) ```rust use mctrust::{Environment, Outcome, TreeSearch, SearchConfig, Reward}; #[derive(Clone)] struct Optimizer { value: f64 } #[derive(Clone, Debug, PartialEq)] enum Action { Increase, Decrease } impl Environment for Optimizer { type Action = Action; fn legal_actions(&self) -> Vec<Action> { vec![Action::Increase, Action::Decrease] } fn apply(&mut self, action: &Action) { match action { Action::Increase => self.value += 0.1, Action::Decrease => self.value -= 0.1, } } fn evaluate(&self) -> Outcome { if (self.value - 1.0).abs() < 0.01 { Outcome::Terminal(Reward::WIN) } else if self.value < -5.0 || self.value > 5.0 { Outcome::Terminal(Reward::LOSS) } else { Outcome::Ongoing } } } let state = Optimizer { value: 0.0 }; let config = SearchConfig::default(); let mut search = TreeSearch::new(state, config); let best_action = search.run(); ``` ## Quick Start (Bandit-based) ```rust use mctrust::{BanditSearch, BanditConfig}; let mut search = BanditSearch::new(BanditConfig::default()); // Register 100 options across 5 categories. for category in 0..5u64 { for option in 0..20u64 { search.add_arm(category * 20 + option, category as u32); } } // Explore and feed rewards. while let Some(option_id) = search.next_arm() { let reward = 0.5; // your evaluation search.observe(option_id, reward); } ``` ## Capabilities - **UCT selection** with configurable exploration constant - **PUCT** (AlphaZero-style) with pluggable action priors - **Gumbel `MuZero`** — hyperparameter-free Sequential Halving - **Thompson Sampling** for stochastic exploration - **RAVE** for rapid cross-branch value estimation - **Pluggable `Evaluator` trait** — replace random rollout with neural networks or domain heuristics - **Multi-agent support** — negamax-style backpropagation for adversarial environments - **Tree reuse** — re-root the tree after each decision, preserving subtree statistics - **DAG transposition tables** — merge identical states, compress the tree - **Root parallelism** via Rayon — lock-free linear scaling across cores - **Node budget** — bound memory usage with `with_max_nodes()` - **`run_until(predicate)`** — arbitrary stop conditions (reward threshold, convergence, cancellation) - **Principal variation extraction** — follow the best line through the tree (actions + states) - **Graphviz DOT export** — visualize the search tree at any depth - **Checkpoint/restore** — pause and resume searches across processes - **Progressive widening** for continuous or massive action spaces - **Stepped iteration** — `run_step()` for fine-grained control, async interleaving - **Time and iteration budgets** — wall-clock deadline support for real-time systems - **Deterministic RNG** via `ChaCha8Rng` - **Zero domain assumptions** — works for any sequential decision problem ## References - Kocsis & Szepesvári, "Bandit-based Monte Carlo Planning" (2006) - Gelly & Silver, "Monte-Carlo tree search and rapid action value estimation in computer Go" (2011) - Browne et al., "A Survey of Monte Carlo Tree Search Methods" (2012) - Danihelka et al., "Policy Improvement by Planning with Gumbel" (2022)
## Architecture
The crate is organized into the following public modules:
- (see source for module layout)
## Guarantees
- `#![forbid(unsafe_code)]` where applicable; see `src/lib.rs` for the exact lint preamble.
- All public types have doc comments.
- Error messages are actionable where applicable.
## Public API Summary
Key entry points are exported from `src/lib.rs` via `pub mod` and `pub use` re-exports.
Consult the module-level documentation in each source file for function signatures and usage examples.
## Error Handling
- Standard `Result` / error types.