Crate mctrust

Expand description

§mctrust — Production-grade Monte Carlo Tree Search

A generic, zero-domain-assumption MCTS engine for any sequential decision-making problem: game AI, portfolio optimization, security scan planning, hyperparameter tuning, molecular design, compiler optimization, robot path planning, or anything with a state space to explore.

§Design

This crate provides two complementary search interfaces:

Environment-based (TreeSearch) — Full tree-search MCTS. You define an Environment with actions, state transitions, and terminal conditions. The engine handles selection, expansion, simulation, and backpropagation. Use for: optimization, planning, game AI, theorem proving, scheduling.
Bandit-based (BanditSearch) — Two-level MCTS for explore/exploit problems where you have a flat set of options grouped by category. The engine uses UCT+RAVE to decide which option to try next, and you feed back rewards. Use for: fuzzing, A/B testing, hyperparameter search, scan strategy optimization, any problem with pre-enumerated choices.

§Feature Flags

Feature	Default	Description
`std`	✅	Enables `rand/std` and `StdRng` — disable for `#![no_std]` targets
`parallel`	✅	Root-parallel MCTS via Rayon — linear scaling across CPU cores
`dag`	✅	Transposition Table (DAG) via `hashbrown` — compresses identical states
`toml`	✅	Enables TOML config parsing (`from_toml_str`, `from_toml_file`)

§Quick Start (Environment-based)

use mctrust::{Environment, Outcome, TreeSearch, SearchConfig, Reward};

#[derive(Clone)]
struct Optimizer { value: f64 }

#[derive(Clone, Debug, PartialEq)]
enum Action { Increase, Decrease }

impl Environment for Optimizer {
    type Action = Action;

    fn legal_actions(&self) -> Vec<Action> {
        vec![Action::Increase, Action::Decrease]
    }

    fn apply(&mut self, action: &Action) {
        match action {
            Action::Increase => self.value += 0.1,
            Action::Decrease => self.value -= 0.1,
        }
    }

    fn evaluate(&self) -> Outcome {
        if (self.value - 1.0).abs() < 0.01 {
            Outcome::Terminal(Reward::WIN)
        } else if self.value < -5.0 || self.value > 5.0 {
            Outcome::Terminal(Reward::LOSS)
        } else {
            Outcome::Ongoing
        }
    }
}

let state = Optimizer { value: 0.0 };
let config = SearchConfig::default();
let mut search = TreeSearch::new(state, config);
let best_action = search.run();

§Quick Start (Bandit-based)

use mctrust::{BanditSearch, BanditConfig};

let mut search = BanditSearch::new(BanditConfig::default());

// Register 100 options across 5 categories.
for category in 0..5u64 {
    for option in 0..20u64 {
        search.add_arm(category * 20 + option, category as u32);
    }
}

// Explore and feed rewards.
while let Some(option_id) = search.next_arm() {
    let reward = 0.5; // your evaluation
    search.observe(option_id, reward);
}

§Capabilities

UCT selection with configurable exploration constant
PUCT (AlphaZero-style) with pluggable action priors
Gumbel MuZero — hyperparameter-free Sequential Halving
Thompson Sampling for stochastic exploration
RAVE for rapid cross-branch value estimation
Pluggable Evaluator trait — replace random rollout with neural networks or domain heuristics
Multi-agent support — negamax-style backpropagation for adversarial environments
Tree reuse — re-root the tree after each decision, preserving subtree statistics
DAG transposition tables — merge identical states, compress the tree
Root parallelism via Rayon — lock-free linear scaling across cores
Node budget — bound memory usage with with_max_nodes()
run_until(predicate) — arbitrary stop conditions (reward threshold, convergence, cancellation)
Principal variation extraction — follow the best line through the tree (actions + states)
Graphviz DOT export — visualize the search tree at any depth
Checkpoint/restore — pause and resume searches across processes
Progressive widening for continuous or massive action spaces
Stepped iteration — run_step() for fine-grained control, async interleaving
Time and iteration budgets — wall-clock deadline support for real-time systems
Deterministic RNG via ChaCha8Rng
Zero domain assumptions — works for any sequential decision problem

§References

Kocsis & Szepesvári, “Bandit-based Monte Carlo Planning” (2006)
Gelly & Silver, “Monte-Carlo tree search and rapid action value estimation in computer Go” (2011)
Browne et al., “A Survey of Monte Carlo Tree Search Methods” (2012)
Danihelka et al., “Policy Improvement by Planning with Gumbel” (2022)

Structs§

BanditConfig: Configuration for crate::BanditSearch.
BanditSearch: Arms are grouped by category. The engine maintains a two-level tree: root → group nodes → arm selection within groups.
BanditSearchCheckpoint: Serializable checkpoint for restoring bandit search progress.
GroupStats: Read-only statistics for a group in crate::BanditSearch.
Heuristic: Optional heuristic signals for a state.
NodeStats: Statistics exposed for diagnostics and visualization.
ProgressiveWideningConfig: Progressive widening for large or continuous action spaces.
RaveConfig: Configuration for AMAF/RAVE value blending.
Reward: Quantitative reward for an MCTS state or simulation outcome.
Scalarizer: Scalarizer for named signals supplied at observation time.
SearchConfig: Configuration for crate::TreeSearch.
TreeSearch: Game-tree MCTS engine.
TreeSearchCheckpoint: Serialisable game-search checkpoint used for mid-search persistence.

Enums§

Outcome: Terminal or non-terminal state of an environment.
SearchConfigLoadError: Error returned when loading config from disk.
TreePolicy: Tree policy used during child selection.

Traits§

Environment: A domain environment that the MCTS engine can explore.
Evaluator: Pluggable leaf evaluator for replacing random rollouts with domain-specific or neural network-based evaluation.

Crate mctrust

Crate mctrust Copy item path

§mctrust — Production-grade Monte Carlo Tree Search

§Design

§Feature Flags

§Quick Start (Environment-based)

§Quick Start (Bandit-based)

§Capabilities

§References

Structs§

Enums§

Traits§

Crate mctrust