Skip to main content

Crate treant_gumbel

Crate treant_gumbel 

Source
Expand description

Gumbel MuZero search: policy improvement by planning with Gumbel.

Implements the algorithm from Danihelka et al., “Policy improvement by planning with Gumbel” (ICLR 2022).

Key features:

  • Gumbel-Top-k sampling at the root for action selection
  • Sequential Halving for optimal simulation budget allocation
  • PUCT selection for tree traversal below the root
  • Improved policy output — a better training target than visit counts

§Design

Gumbel search is fundamentally different from standard MCTS at the root level: instead of UCT/PUCT selection, it samples Gumbel noise, selects top-m actions, then uses Sequential Halving to allocate simulations. Below the root, standard PUCT guides tree traversal. This produces monotonically improving policies — more simulations always help.

The crate reuses treant::GameState so any game implemented for the core MCTS crate works with Gumbel search.

§Example

use treant::GameState;
use treant_gumbel::{GumbelSearch, GumbelConfig, GumbelEvaluator};

let mut search = GumbelSearch::new(Eval, GumbelConfig::default());
let result = search.search(&MyGame, 100);
println!("Best move: {:?}", result.best_move);

Structs§

GumbelConfig
Configuration for Gumbel search.
GumbelSearch
Gumbel MCTS search engine.
MoveStats
Per-move statistics from Gumbel search.
SearchResult
Result of a Gumbel search.

Traits§

GumbelEvaluator
Evaluator providing policy logits and value estimates.