Expand description
Gumbel MuZero search: policy improvement by planning with Gumbel.
Implements the algorithm from Danihelka et al., “Policy improvement by planning with Gumbel” (ICLR 2022).
Key features:
- Gumbel-Top-k sampling at the root for action selection
- Sequential Halving for optimal simulation budget allocation
- PUCT selection for tree traversal below the root
- Improved policy output — a better training target than visit counts
§Design
Gumbel search is fundamentally different from standard MCTS at the root level: instead of UCT/PUCT selection, it samples Gumbel noise, selects top-m actions, then uses Sequential Halving to allocate simulations. Below the root, standard PUCT guides tree traversal. This produces monotonically improving policies — more simulations always help.
The crate reuses treant::GameState so any game implemented for the core MCTS
crate works with Gumbel search.
§Example
use treant::GameState;
use treant_gumbel::{GumbelSearch, GumbelConfig, GumbelEvaluator};
let mut search = GumbelSearch::new(Eval, GumbelConfig::default());
let result = search.search(&MyGame, 100);
println!("Best move: {:?}", result.best_move);Structs§
- Gumbel
Config - Configuration for Gumbel search.
- Gumbel
Search - Gumbel MCTS search engine.
- Move
Stats - Per-move statistics from Gumbel search.
- Search
Result - Result of a Gumbel search.
Traits§
- Gumbel
Evaluator - Evaluator providing policy logits and value estimates.