mctrust 0.4.0 - Docs.rs

# mctrust

Production-grade Monte Carlo Tree Search for any sequential decision problem. Portfolio optimization, security scan planning, hyperparameter tuning, game AI, molecular design, compiler optimization, robot path planning — anything with a state space to explore. Implement the `Environment` trait and the search handles the rest.

UCT, PUCT, Gumbel MuZero, Thompson Sampling, RAVE, pluggable evaluators, multi-agent negamax, tree reuse, DAG transpositions, root parallelism, progressive widening, node & time budgets, tree visualization, checkpoint/restore.

[![Crates.io](https://img.shields.io/crates/v/mctrust)](https://crates.io/crates/mctrust)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

## Quick Start

```rust
use mctrust::{Environment, Outcome, SearchConfig, TreeSearch, Reward};

#[derive(Clone)]
struct Optimizer { value: f64, target: f64 }

#[derive(Clone, Debug, PartialEq)]
enum Action { Increase, Decrease }

impl Environment for Optimizer {
    type Action = Action;

    fn legal_actions(&self) -> Vec<Action> {
        vec![Action::Increase, Action::Decrease]
    }

    fn apply(&mut self, action: &Action) {
        match action {
            Action::Increase => self.value += 0.1,
            Action::Decrease => self.value -= 0.1,
        }
    }

    fn evaluate(&self) -> Outcome {
        if (self.value - self.target).abs() < 0.05 {
            Outcome::Success(Reward::WIN)
        } else if self.value.abs() > 10.0 {
            Outcome::Failure
        } else {
            Outcome::Ongoing
        }
    }
}

let state = Optimizer { value: 0.0, target: 3.0 };
let config = SearchConfig::builder().iterations(10_000).build();
let mut search = TreeSearch::new(state, config);
let best = search.run(); // Some(Action::Increase)
```

## Two Search Engines

### Environment-based (`TreeSearch`)

Full tree-search MCTS. Define an `Environment` with actions, transitions, and terminal conditions. The engine handles selection, expansion, simulation, and backpropagation.

Use for: optimization, planning, scheduling, game AI, theorem proving.

### Bandit-based (`BanditSearch`)

Two-level MCTS for explore/exploit problems with a flat set of options grouped by category. Uses UCT+RAVE to decide which option to evaluate next.

Use for: fuzzing, A/B testing, hyperparameter search, scan strategy optimization.

```rust
use mctrust::{BanditSearch, BanditConfig};

let mut search = BanditSearch::new(BanditConfig::default());
for category in 0..5u64 {
    for option in 0..20u64 {
        search.add_arm(category * 20 + option, category as u32);
    }
}
while let Some(option_id) = search.next_arm() {
    let reward = evaluate(option_id);
    search.observe(option_id, reward);
}
```

## Tree Policies

| Policy | Use Case | Exploration Constant? |
|--------|----------|----------------------|
| `Uct` | General purpose | Yes |
| `Puct` | Neural-network-guided search (AlphaZero-style) | Yes |
| `Gumbel` | Hyperparameter-free search (MuZero-style) | **No** |
| `ThompsonSampling` | Stochastic exploration | Temperature only |

```rust
use mctrust::{SearchConfig, TreePolicy};

// Gumbel MuZero: same quality in ~16x fewer simulations, zero tuning
let config = SearchConfig::builder()
    .iterations(1_000)
    .tree_policy(TreePolicy::Gumbel {
        sampled_actions: 16,
        max_completions_coeff: 50.0,
    })
    .build();
```

## Pluggable Evaluator

Replace random rollouts with neural networks, domain heuristics, or any custom evaluation:

```rust
use mctrust::{Evaluator, Environment, Reward};

struct ValueNetwork { /* your model */ }

impl<E: Environment> Evaluator<E> for ValueNetwork {
    fn evaluate(&self, env: &E) -> Reward {
        // Pass state through your model, return value estimate
        Reward::new(0.7)
    }
}

// search.with_evaluator(Arc::new(ValueNetwork { ... }));
```

## Multi-Agent Environments

For adversarial or multi-agent environments, implement `current_player()` and `num_players()`:

```rust
fn current_player(&self) -> usize {
    self.turn % self.num_agents
}
fn num_players(&self) -> usize { 2 }
```

The engine automatically applies negamax-style reward flipping during backpropagation.

## Search Control

### Time Budgets

```rust
use std::time::Duration;
let config = SearchConfig::builder()
    .iterations(10_000_000)  // high ceiling
    .time_budget(Duration::from_millis(50))  // but stop after 50ms
    .build();
```

### Arbitrary Stop Conditions (`run_until`)

```rust
search.run_until(|s| {
    s.best_root_reward().unwrap_or(0.0) > 0.95
    || s.total_simulations() >= 50_000
});
```

### Node Budget (Memory Safety)

```rust
search.with_max_nodes(1_000_000); // never allocate more than 1M nodes
search.run();
```

### Stepped Iteration

```rust
for _ in 0..100 {
    search.run_step();
}
if search.best_root_reward().unwrap_or(0.0) > 0.9 {
    // Early exit — already confident
}
```

## Tree Reuse (Warm Start Between Decisions)

After choosing an action, re-root the tree at the chosen child's subtree instead of discarding everything. Preserves all accumulated statistics — the next search starts with a massive head start.

```rust
loop {
    if let Some(best) = search.run() {
        apply_action(best.clone());
        search.advance_to_action(&best); // preserves subtree
    }
}
```

No other Rust MCTS crate supports tree reuse.

## DAG Transposition Tables

When your environment has multiple paths to the same state, enable DAG mode to merge duplicate nodes:

```rust
search.enable_dag();
search.run();
```

Requires implementing `state_hash()` on your `Environment`.

## Root Parallelism

Lock-free linear scaling across CPU cores:

```rust
let best = search.run_parallel(num_cpus::get());
```

Each thread runs an independent search tree; results are merged at the root.

## Diagnostics & Visualization

```rust
// Best line of actions through the tree:
let pv = search.principal_variation(); // Vec<Action>

// Full intermediate state sequence along the PV:
let pv_states = search.principal_variation_states(); // Vec<Environment>

// Per-action visit counts and rewards at root:
let stats = search.root_stats(); // Vec<(Action, NodeStats)>

// Best action's average reward:
let reward = search.best_root_reward(); // Option<f64>

// Graphviz DOT export for visual inspection:
let dot = search.export_dot(5);
std::fs::write("tree.dot", dot)?;
// Then: dot -Tsvg tree.dot -o tree.svg
```

## TOML Configuration

```toml
iterations = 50000
exploration_constant = 1.41
max_depth = 100

[tree_policy]
kind = "gumbel"
sampled_actions = 16
max_completions_coeff = 50.0

[rave]
enabled = true
bias = 300.0
```

```rust
let config = SearchConfig::from_toml_str(toml_string)?;
let config = SearchConfig::from_toml_file("search.toml")?;
```

## Feature Flags

| Feature    | Default | Description |
|------------|---------|-------------|
| `std`      | ✅ | Enables `rand/std` — disable for `#![no_std]` targets |
| `parallel` | ✅ | Root-parallel MCTS via Rayon |
| `dag`      | ✅ | Transposition tables via `hashbrown` |
| `toml`     | ✅ | TOML config parsing |

## Use Cases

MCTS generalizes to any sequential decision problem:

- **Finance** — portfolio rebalancing, options hedging, order execution strategies
- **Security** — scan strategy optimization, fuzzing prioritization, attack path planning
- **Operations** — scheduling, resource allocation, supply chain optimization
- **Science** — molecular design, protein folding, experiment design
- **Engineering** — compiler optimization, chip placement, network routing
- **Games** — board games, real-time strategy, puzzle solving

The `Environment` trait is fully generic. If you can define states, actions, and outcomes, mctrust can search it.

## Contributing

Pull requests welcome. Open a PR if you find a bug, a better API, or a rough edge.

## License

MIT. Copyright 2026 CORUM COLLECTIVE LLC.