ai-tournament
A modular Rust crate system for evaluating AI agents via customizable tournaments, supporting sandboxed execution and flexible constraints.
Overview
This project provides tools to benchmark and evaluate AI agents in a controlled tournament setting. It supports custom games, custom tournament strategies, and execution isolation using Linux cgroups v2.
Key Features
-
Pluggable Tournaments: Define your own tournament logic via the
TournamentStrategytrait, or use built-in strategies likeSwissTournamentandSinglePlayerTournament. -
Custom Games: Any environment that implements the
Gametrait can be used. -
Sandboxed Agent Execution: Each agent runs in its own isolated process with:
- Dedicated CPU cores (
taskset) - Memory and CPU limits (via cgroups v2)
- Action timeouts and total time budgets
- Dedicated CPU cores (
-
Configurable Constraints: Use the
ConstraintsBuilderto define:- CPUs used per agent
- Memory limits
- Timeouts and think-time budgets
[!NOTE] Full CPU and RAM isolation requires Linux with cgroups v2 and the
tasksetcommand installed.
If not available, the evaluator can optionally fall back to time-only constraints by settingallow_uncontained = truein the configuration.
Evaluator Configuration
The Configuration struct controls how evaluation is performed. You can use Configuration::new() for defaults or customize it via builder methods.
You can also override its behavior using environment variables (EVAL_VERBOSE, EVAL_ALLOW_UNCONTAINED, etc.). See configuration.rs for details.
Usage Summary
- Implement the
Gametrait for your task or environment. - Provide AI agents as Rust crates or compiled binaries in a specified directory.
- Define resource constraints with the
ConstraintsBuilder. - Choose or implement a
TournamentStrategy. - Run the evaluator to get per-agent scores, as defined by the tournament type.
Agent Directory Structure
There are two ways to organize agent directories depending on whether you're compiling them:
If compile_agents = true (default):
Each agent should be a Rust crate with a YAML config file at the root:
agent_directory/
├── Cargo.toml
├── src/
└── config.yaml
If compile_agents = false:
Each agent subdirectory should contain a precompiled binary and a YAML config:
agent_directory/
├── agent_binary
└── config.yaml
The config.yaml file specifies command-line arguments per named configuration:
eval: default
configs:
- default: "--mode standard" # config used for single config evaluation.
- aggressive: "--mode aggressive"
If test_all_configs = true, all listed configurations will be tested. Otherwise, only the one under eval is used.
Repository Structure
.
├── Cargo.toml # Crate manifest
├── src/
│ ├── agent_collector/ # Agent compilation and loading logic
│ ├── constraints.rs # Resource limits enforcement
│ ├── configuration.rs # Evaluation configuration
│ ├── server.rs # Core evaluation logic
│ ├── tournament_strategy.rs # Tournament scheduling and formats
│ └── ... # Other internal modules
├── README.md
└── TODO.md
Usage Example
use anyhow;
use *;
use ;
// Your custom game implementing the Game + GameFactory traits
use crateYourGame;
[!NOTE] Agents must be Rust crates or precompiled binaries located in the specified directory. Each agent of each match runs as a separate, isolated process.
Example Agent
Here’s a minimal example of an agent compatible with the evaluator system. The agent connects to the evaluator’s server via TCP, reads the game state, and responds with an action:
use ;
use anyhow;
Requirements
YourGame::StateandYourGame::Actionmust implementFromStrandToString- The agent must connect to the provided TCP port and handle communication over the stream
- The agent's select_action call must complete before the action timeout, or it will be forcefully terminated.
License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.