Skip to main content

Crate mctrust

Crate mctrust 

Source
Expand description

A generic, zero-domain-assumption MCTS engine for any sequential decision-making problem: game AI, portfolio optimization, security scan planning, hyperparameter tuning, molecular design, compiler optimization, robot path planning, or anything with a state space to explore.

§Design

This crate provides two complementary search interfaces:

  1. Environment-based (TreeSearch) — Full tree-search MCTS. You define an Environment with actions, state transitions, and terminal conditions. The engine handles selection, expansion, simulation, and backpropagation. Use for: optimization, planning, game AI, theorem proving, scheduling.

  2. Bandit-based (BanditSearch) — Two-level MCTS for explore/exploit problems where you have a flat set of options grouped by category. The engine uses UCT+RAVE to decide which option to try next, and you feed back rewards. Use for: fuzzing, A/B testing, hyperparameter search, scan strategy optimization, any problem with pre-enumerated choices.

§Feature Flags

FeatureDefaultDescription
stdEnables rand/std and StdRng — disable for #![no_std] targets
parallelRoot-parallel MCTS via Rayon — linear scaling across CPU cores
dagTransposition Table (DAG) via hashbrown — compresses identical states
tomlEnables TOML config parsing (from_toml_str, from_toml_file)

§Quick Start (Environment-based)

use mctrust::{Environment, Outcome, TreeSearch, SearchConfig, Reward};

#[derive(Clone)]
struct Optimizer { value: f64 }

#[derive(Clone, Debug, PartialEq)]
enum Action { Increase, Decrease }

impl Environment for Optimizer {
    type Action = Action;

    fn legal_actions(&self) -> Vec<Action> {
        vec![Action::Increase, Action::Decrease]
    }

    fn apply(&mut self, action: &Action) {
        match action {
            Action::Increase => self.value += 0.1,
            Action::Decrease => self.value -= 0.1,
        }
    }

    fn evaluate(&self) -> Outcome {
        if (self.value - 1.0).abs() < 0.01 {
            Outcome::Terminal(Reward::WIN)
        } else if self.value < -5.0 || self.value > 5.0 {
            Outcome::Terminal(Reward::LOSS)
        } else {
            Outcome::Ongoing
        }
    }
}

let state = Optimizer { value: 0.0 };
let config = SearchConfig::default();
let mut search = TreeSearch::new(state, config);
let best_action = search.run();

§Quick Start (Bandit-based)

use mctrust::{BanditSearch, BanditConfig};

let mut search = BanditSearch::new(BanditConfig::default());

// Register 100 options across 5 categories.
for category in 0..5u64 {
    for option in 0..20u64 {
        search.add_arm(category * 20 + option, category as u32);
    }
}

// Explore and feed rewards.
while let Some(option_id) = search.next_arm() {
    let reward = 0.5; // your evaluation
    search.observe(option_id, reward);
}

§Capabilities

  • UCT selection with configurable exploration constant
  • PUCT (AlphaZero-style) with pluggable action priors
  • Gumbel MuZero — hyperparameter-free Sequential Halving
  • Thompson Sampling for stochastic exploration
  • RAVE for rapid cross-branch value estimation
  • Pluggable Evaluator trait — replace random rollout with neural networks or domain heuristics
  • Multi-agent support — negamax-style backpropagation for adversarial environments
  • Tree reuse — re-root the tree after each decision, preserving subtree statistics
  • DAG transposition tables — merge identical states, compress the tree
  • Root parallelism via Rayon — lock-free linear scaling across cores
  • Node budget — bound memory usage with with_max_nodes()
  • run_until(predicate) — arbitrary stop conditions (reward threshold, convergence, cancellation)
  • Principal variation extraction — follow the best line through the tree (actions + states)
  • Graphviz DOT export — visualize the search tree at any depth
  • Checkpoint/restore — pause and resume searches across processes
  • Progressive widening for continuous or massive action spaces
  • Stepped iterationrun_step() for fine-grained control, async interleaving
  • Time and iteration budgets — wall-clock deadline support for real-time systems
  • Deterministic RNG via ChaCha8Rng
  • Zero domain assumptions — works for any sequential decision problem

§References

  • Kocsis & Szepesvári, “Bandit-based Monte Carlo Planning” (2006)
  • Gelly & Silver, “Monte-Carlo tree search and rapid action value estimation in computer Go” (2011)
  • Browne et al., “A Survey of Monte Carlo Tree Search Methods” (2012)
  • Danihelka et al., “Policy Improvement by Planning with Gumbel” (2022)

Structs§

BanditConfig
Configuration for crate::BanditSearch.
BanditSearch
Arms are grouped by category. The engine maintains a two-level tree: root → group nodes → arm selection within groups.
BanditSearchCheckpoint
Serializable checkpoint for restoring bandit search progress.
GroupStats
Read-only statistics for a group in crate::BanditSearch.
Heuristic
Optional heuristic signals for a state.
NodeStats
Statistics exposed for diagnostics and visualization.
ProgressiveWideningConfig
Progressive widening for large or continuous action spaces.
RaveConfig
Configuration for AMAF/RAVE value blending.
Reward
Quantitative reward for an MCTS state or simulation outcome.
Scalarizer
Scalarizer for named signals supplied at observation time.
SearchConfig
Configuration for crate::TreeSearch.
TreeSearch
Game-tree MCTS engine.
TreeSearchCheckpoint
Serialisable game-search checkpoint used for mid-search persistence.

Enums§

Outcome
Terminal or non-terminal state of an environment.
SearchConfigLoadError
Error returned when loading config from disk.
TreePolicy
Tree policy used during child selection.

Traits§

Environment
A domain environment that the MCTS engine can explore.
Evaluator
Pluggable leaf evaluator for replacing random rollouts with domain-specific or neural network-based evaluation.