1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
//! # mctrust — Production-grade Monte Carlo Tree Search
//!
//! A generic, zero-domain-assumption MCTS engine for any sequential
//! decision-making problem: game AI, portfolio optimization, security
//! scan planning, hyperparameter tuning, molecular design, compiler
//! optimization, robot path planning, or anything with a state space
//! to explore.
//!
//! ## Design
//!
//! This crate provides two complementary search interfaces:
//!
//! 1. **Environment-based** ([`TreeSearch`]) — Full tree-search MCTS. You define
//! an [`Environment`] with actions, state transitions, and terminal conditions.
//! The engine handles selection, expansion, simulation, and backpropagation.
//! Use for: optimization, planning, game AI, theorem proving, scheduling.
//!
//! 2. **Bandit-based** ([`BanditSearch`]) — Two-level MCTS for explore/exploit
//! problems where you have a flat set of options grouped by category.
//! The engine uses UCT+RAVE to decide which option to try next, and you
//! feed back rewards. Use for: fuzzing, A/B testing, hyperparameter search,
//! scan strategy optimization, any problem with pre-enumerated choices.
//!
//! ## Feature Flags
//!
//! | Feature | Default | Description |
//! |------------|---------|-------------|
//! | `std` | ✅ | Enables `rand/std` and `StdRng` — disable for `#![no_std]` targets |
//! | `parallel` | ✅ | Root-parallel MCTS via Rayon — linear scaling across CPU cores |
//! | `dag` | ✅ | Transposition Table (DAG) via `hashbrown` — compresses identical states |
//! | `toml` | ✅ | Enables TOML config parsing (`from_toml_str`, `from_toml_file`) |
//!
//! ## Quick Start (Environment-based)
//!
//! ```rust
//! use mctrust::{Environment, Outcome, TreeSearch, SearchConfig, Reward};
//!
//! #[derive(Clone)]
//! struct Optimizer { value: f64 }
//!
//! #[derive(Clone, Debug, PartialEq)]
//! enum Action { Increase, Decrease }
//!
//! impl Environment for Optimizer {
//! type Action = Action;
//!
//! fn legal_actions(&self) -> Vec<Action> {
//! vec![Action::Increase, Action::Decrease]
//! }
//!
//! fn apply(&mut self, action: &Action) {
//! match action {
//! Action::Increase => self.value += 0.1,
//! Action::Decrease => self.value -= 0.1,
//! }
//! }
//!
//! fn evaluate(&self) -> Outcome {
//! if (self.value - 1.0).abs() < 0.01 {
//! Outcome::Terminal(Reward::WIN)
//! } else if self.value < -5.0 || self.value > 5.0 {
//! Outcome::Terminal(Reward::LOSS)
//! } else {
//! Outcome::Ongoing
//! }
//! }
//! }
//!
//! let state = Optimizer { value: 0.0 };
//! let config = SearchConfig::default();
//! let mut search = TreeSearch::new(state, config);
//! let best_action = search.run();
//! ```
//!
//! ## Quick Start (Bandit-based)
//!
//! ```rust
//! use mctrust::{BanditSearch, BanditConfig};
//!
//! let mut search = BanditSearch::new(BanditConfig::default());
//!
//! // Register 100 options across 5 categories.
//! for category in 0..5u64 {
//! for option in 0..20u64 {
//! search.add_arm(category * 20 + option, category as u32);
//! }
//! }
//!
//! // Explore and feed rewards.
//! while let Some(option_id) = search.next_arm() {
//! let reward = 0.5; // your evaluation
//! search.observe(option_id, reward);
//! }
//! ```
//!
//! ## Capabilities
//!
//! - **UCT selection** with configurable exploration constant
//! - **PUCT** (AlphaZero-style) with pluggable action priors
//! - **Gumbel `MuZero`** — hyperparameter-free Sequential Halving
//! - **Thompson Sampling** for stochastic exploration
//! - **RAVE** for rapid cross-branch value estimation
//! - **Pluggable `Evaluator` trait** — replace random rollout with neural networks or domain heuristics
//! - **Multi-agent support** — negamax-style backpropagation for adversarial environments
//! - **Tree reuse** — re-root the tree after each decision, preserving subtree statistics
//! - **DAG transposition tables** — merge identical states, compress the tree
//! - **Root parallelism** via Rayon — lock-free linear scaling across cores
//! - **Node budget** — bound memory usage with `with_max_nodes()`
//! - **`run_until(predicate)`** — arbitrary stop conditions (reward threshold, convergence, cancellation)
//! - **Principal variation extraction** — follow the best line through the tree (actions + states)
//! - **Graphviz DOT export** — visualize the search tree at any depth
//! - **Checkpoint/restore** — pause and resume searches across processes
//! - **Progressive widening** for continuous or massive action spaces
//! - **Stepped iteration** — `run_step()` for fine-grained control, async interleaving
//! - **Time and iteration budgets** — wall-clock deadline support for real-time systems
//! - **Deterministic RNG** via `ChaCha8Rng`
//! - **Zero domain assumptions** — works for any sequential decision problem
//!
//! ## References
//!
//! - Kocsis & Szepesvári, "Bandit-based Monte Carlo Planning" (2006)
//! - Gelly & Silver, "Monte-Carlo tree search and rapid action value estimation
//! in computer Go" (2011)
//! - Browne et al., "A Survey of Monte Carlo Tree Search Methods" (2012)
//! - Danihelka et al., "Policy Improvement by Planning with Gumbel" (2022)
pub use ;
pub use SearchConfigLoadError;
pub use ;
pub use Evaluator;
pub use Heuristic;
pub use ;
pub use ;
pub use NodeStats;
pub use Reward;