1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// SPDX-FileCopyrightText: 2026 Andrei G <bug-ops>
// SPDX-License-Identifier: MIT OR Apache-2.0
//! tau2-bench full-environment support.
//!
//! Implements loaders, in-memory environment executors, and an action-trace
//! evaluator for the [`sierra-research/tau2-bench`](https://github.com/sierra-research/tau2-bench)
//! benchmark.
//!
//! # Architecture
//!
//! ```text
//! Tau2BenchLoader (retail/airline)
//! └─ loads tasks.json → Vec<Scenario> (metadata carries EvaluationCriteria)
//!
//! Per scenario (in BenchRunner::run_dataset_with_env_factory):
//! RetailEnv / AirlineEnv ←──── db.json seed
//! │ implements ToolExecutor
//! │ records every tool call to ActionTrace (Arc<Mutex<Vec<RecordedToolCall>>>)
//! └─▶ agent sees tool results, calls more tools
//!
//! TauBenchEvaluator
//! │ holds clone of the same ActionTrace
//! └─▶ after run: scores gold_actions vs recorded_calls
//! ```
//!
//! # Supported domains
//!
//! | Dataset name | Domain | Loader constructor |
//! |---|---|---|
//! | `tau2-bench-retail` | Retail customer service | [`loader::Tau2BenchLoader::retail`] |
//! | `tau2-bench-airline` | Airline flight reservation | [`loader::Tau2BenchLoader::airline`] |
pub use Domain;
pub use ;
pub use ActionTrace;
pub use SnapshotableEnv;
pub use AirlineEnv;
pub use RetailEnv;
pub use ;
pub use ;