treant-gumbel
Gumbel MuZero search for Rust: Sequential Halving with Gumbel noise for Monte Carlo Tree Search. Produces a policy with monotonic improvement — more simulations always yield a better move distribution.
Built on top of the treant crate, reusing its GameState trait so any game works with both standard MCTS and Gumbel search.
Based on Danihelka et al., "Policy improvement by planning with Gumbel" (ICLR 2022).
When to use
- Self-play training with guaranteed policy improvement
- Distilling search into a neural network
- Low simulation budgets where PUCT degrades
Example
use ;
// See the docs for a complete example.
License
MIT