Crate treant_gumbel

Expand description

Gumbel MuZero search: policy improvement by planning with Gumbel.

Implements the algorithm from Danihelka et al., “Policy improvement by planning with Gumbel” (ICLR 2022).

Key features:

Gumbel-Top-k sampling at the root for action selection
Sequential Halving for optimal simulation budget allocation
PUCT selection for tree traversal below the root
Improved policy output — a better training target than visit counts

§Design

Gumbel search is fundamentally different from standard MCTS at the root level: instead of UCT/PUCT selection, it samples Gumbel noise, selects top-m actions, then uses Sequential Halving to allocate simulations. Below the root, standard PUCT guides tree traversal. This produces monotonically improving policies — more simulations always help.

The crate reuses treant::GameState so any game implemented for the core MCTS crate works with Gumbel search.

§Example

use treant::GameState;
use treant_gumbel::{GumbelSearch, GumbelConfig, GumbelEvaluator};

let mut search = GumbelSearch::new(Eval, GumbelConfig::default());
let result = search.search(&MyGame, 100);
println!("Best move: {:?}", result.best_move);

Structs§

GumbelConfig: Configuration for Gumbel search.
GumbelSearch: Gumbel MCTS search engine.
MoveStats: Per-move statistics from Gumbel search.
SearchResult: Result of a Gumbel search.

Traits§

GumbelEvaluator: Evaluator providing policy logits and value estimates.