# Tool Orchestrator - Universal Programmatic Tool Calling
[](https://github.com/anthropics/tool-orchestrator)
[](https://github.com/anthropics/tool-orchestrator)
[](https://www.rust-lang.org/)
A model-agnostic implementation of Anthropic's [Programmatic Tool Calling](https://www.anthropic.com/engineering/advanced-tool-use) pattern. Instead of sequential tool calls consuming tokens, any LLM writes Rhai scripts that orchestrate multiple tools efficiently.
## Background: The Problem with Traditional Tool Calling
Traditional AI tool calling follows a request-response pattern:
```
LLM: "Call get_expenses(employee_id=1)"
→ Returns 100 expense items to context
LLM: "Call get_expenses(employee_id=2)"
→ Returns 100 more items to context
... (20 employees later)
→ 2,000+ line items polluting the context window
→ 110,000+ tokens just to produce a summary
```
Each intermediate result floods the model's context window, wasting tokens and degrading performance.
## Anthropic's Solution: Programmatic Tool Calling
In November 2024, Anthropic introduced [Programmatic Tool Calling (PTC)](https://www.anthropic.com/engineering/advanced-tool-use) as part of their advanced tool use features. The key insight:
> **LLMs excel at writing code.** Instead of reasoning through one tool call at a time, let them write code that orchestrates entire workflows.
Their approach:
1. Claude writes Python code that calls multiple tools
2. Code executes in Anthropic's managed sandbox
3. Only the final result returns to the context window
**Results:** 37-98% token reduction, lower latency, more reliable control flow.
### References
- [Introducing advanced tool use on the Claude Developer Platform](https://www.anthropic.com/engineering/advanced-tool-use) - Anthropic Engineering Blog
- [CodeAct: Executable Code Actions Elicit Better LLM Agents](https://arxiv.org/abs/2402.01030) - Academic research on code-based tool orchestration
## Why This Crate? Universal Access
Anthropic's implementation has constraints:
- **Claude-only**: Requires Claude 4.5 with the `advanced-tool-use-2025-11-20` beta header
- **Python-only**: Scripts must be Python
- **Anthropic-hosted**: Execution happens in their managed sandbox
- **API-dependent**: Requires their code execution tool to be enabled
**Tool Orchestrator** provides the same benefits for **any LLM provider**:
| **Model** | Claude 4.5 only | Any LLM that can write code |
| **Language** | Python | Rhai (Rust-like, easy for LLMs) |
| **Execution** | Anthropic's sandbox | Your local process |
| **Runtime** | Server-side (their servers) | Client-side (your control) |
| **Dependencies** | API call + beta header | Pure Rust, zero runtime deps |
| **Targets** | Python environments | Native Rust + WASM (browser/Node.js) |
### Supported LLM Providers
- Claude (all versions, not just 4.5)
- OpenAI (GPT-4, GPT-4o, o1, etc.)
- Google (Gemini Pro, etc.)
- Anthropic competitors (Mistral, Cohere, etc.)
- Local models (Ollama, llama.cpp, vLLM)
- Any future provider
## How It Works
```
┌─────────────────────────────────────────────────────────────────┐
│ TRADITIONAL APPROACH │
│ │
│ LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons │
│ LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons │
│ LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons │
│ (tokens multiply rapidly) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PROGRAMMATIC TOOL CALLING │
│ │
│ LLM writes script: │
│ ┌──────────────────────────────────────┐ │
│ │ let results = []; │ │
│ │ for id in employee_ids { │ Executes locally │
│ │ let expenses = get_expenses(id); │ ─────────────────→ │
│ │ let flagged = expenses.filter(...); │ Tools called │
│ │ results.push(flagged); │ in sandbox │
│ │ } │ │
│ │ summarize(results) // Only this │ ←───────────────── │
│ └──────────────────────────────────────┘ returns to LLM │
└─────────────────────────────────────────────────────────────────┘
```
1. **Register tools** - Your actual tool implementations (file I/O, APIs, etc.)
2. **LLM writes script** - Any LLM generates a Rhai script orchestrating those tools
3. **Sandboxed execution** - Script runs locally with configurable safety limits
4. **Minimal context** - Only the final result enters the conversation
## Multi-Target Architecture
This crate produces **two outputs** from a single codebase:
| **Rust Library** | Native Rust crate with `Arc<Mutex>` thread safety | CLI tools, server-side apps, native integrations |
| **WASM Package** | Browser/Node.js module with `Rc<RefCell>` | Web apps, npm packages, browser-based AI |
## Benefits
- **37-98% token reduction** - Intermediate results stay in sandbox, only final output returns
- **Batch operations** - Process thousands of items in loops without context pollution
- **Conditional logic** - if/else based on tool results, handled in code not LLM reasoning
- **Data transformation** - Filter, aggregate, transform between tool calls
- **Explicit control flow** - Loops, error handling, retries are code, not implicit reasoning
- **Model agnostic** - Works with any LLM that can write Rhai/Rust-like code
- **Audit trail** - Every tool call is recorded with timing and results
## Installation & Building
### Rust Library (default)
```bash
# Add to Cargo.toml
cargo add tool-orchestrator
# Or build from source
cargo build
```
### WASM Package
```bash
# Build for web (browser)
wasm-pack build --target web --features wasm --no-default-features
# Build for Node.js
wasm-pack build --target nodejs --features wasm --no-default-features
# The package is generated in ./pkg/
```
## Usage
### Rust Library
```rust
use tool_orchestrator::{ToolOrchestrator, ExecutionLimits};
// Create orchestrator
let mut orchestrator = ToolOrchestrator::new();
// Register tools as executor functions
std::fs::read_to_string(path).map_err(|e| e.to_string())
});
let entries: Vec<String> = std::fs::read_dir(path)
.map_err(|e| e.to_string())?
.filter_map(|e| e.ok().map(|e| e.path().display().to_string()))
.collect();
Ok(entries.join("\n"))
});
// Execute a Rhai script (written by any LLM)
let script = r#"
let files = list_directory("src");
let rust_files = [];
for file in files.split("\n") {
if file.ends_with(".rs") {
rust_files.push(file);
}
}
`Found ${rust_files.len()} Rust files: ${rust_files}`
"#;
let result = orchestrator.execute(script, ExecutionLimits::default())?;
println!("Output: {}", result.output); // Final result only
println!("Tool calls: {:?}", result.tool_calls); // Audit trail
```
### WASM (JavaScript/TypeScript)
```typescript
import init, { WasmOrchestrator, ExecutionLimits } from 'tool-orchestrator';
await init();
const orchestrator = new WasmOrchestrator();
// Register a JavaScript function as a tool
orchestrator.register_tool('get_weather', (inputJson: string) => {
const input = JSON.parse(inputJson);
// Your implementation here
return JSON.stringify({ temp: 72, condition: 'sunny' });
});
// Execute a Rhai script
const limits = new ExecutionLimits();
const result = orchestrator.execute(`
let weather = get_weather("San Francisco");
\`Current weather: \${weather}\`
`, limits);
console.log(result);
// { success: true, output: "Current weather: ...", tool_calls: [...] }
```
## Safety & Sandboxing
The orchestrator includes built-in limits to prevent runaway scripts:
| `max_operations` | 100,000 | Prevents infinite loops |
| `max_tool_calls` | 50 | Limits tool invocations |
| `timeout_ms` | 30,000 | Execution timeout |
| `max_string_size` | 10MB | Maximum string length |
| `max_array_size` | 10,000 | Maximum array elements |
```rust
// Preset profiles
let quick = ExecutionLimits::quick(); // 10k ops, 10 calls, 5s
let extended = ExecutionLimits::extended(); // 500k ops, 100 calls, 2m
// Custom limits
let limits = ExecutionLimits::default()
.with_max_operations(50_000)
.with_max_tool_calls(25)
.with_timeout_ms(10_000);
```
## Security Considerations
### What the Sandbox Prevents
Rhai scripts executed by this crate **cannot**:
- Access the filesystem (no `std::fs`, no file I/O)
- Make network requests (no sockets, no HTTP)
- Execute shell commands (no `std::process`)
- Access environment variables
- Spawn threads or processes
- Access raw memory or use unsafe code
The only way scripts interact with the outside world is through **explicitly registered tools**.
### Tool Implementer Responsibility
**You are responsible for the security of tools you register.** If you register a tool that:
- Executes shell commands → scripts can run arbitrary commands
- Reads/writes files → scripts can access your filesystem
- Makes HTTP requests → scripts can exfiltrate data
Design your tools with the principle of least privilege.
### Timeout Behavior
The `timeout_ms` limit uses Rhai's `on_progress` callback for **real-time enforcement**:
- Timeout is checked after every Rhai operation (not just at the end)
- CPU-intensive loops will be terminated mid-execution when timeout is exceeded
- **Note:** Timeout checks don't occur *during* a tool call - if a registered tool blocks for 10 seconds, that time isn't interruptible
- For tools that may block, implement your own timeouts within the tool executor
### Recommended Limits for Untrusted Input
```rust
let limits = ExecutionLimits::default()
.with_max_operations(10_000) // Tight loop protection
.with_max_tool_calls(5) // Limit external interactions
.with_timeout_ms(5_000) // 5 second max
.with_max_string_size(100_000) // 100KB strings
.with_max_array_size(1_000); // 1K elements
```
## Why Rhai Instead of Python?
Anthropic uses Python because Claude is trained extensively on it. We chose [Rhai](https://rhai.rs/) for different reasons:
| **Safety** | Requires heavy sandboxing | Sandboxed by design, no filesystem/network access |
| **Embedding** | CPython runtime (large) | Pure Rust, compiles into your binary |
| **WASM** | Complex (Pyodide, etc.) | Native WASM support |
| **Syntax** | Python-specific | Rust-like (familiar to many LLMs) |
| **Performance** | Interpreter overhead | Optimized for embedding |
| **Dependencies** | Python ecosystem | Zero runtime dependencies |
LLMs have no trouble generating Rhai - it's syntactically similar to Rust/JavaScript:
```rhai
// Variables
let x = 42;
let name = "Claude";
// String interpolation (backticks)
let greeting = `Hello, ${name}!`;
// Arrays and loops
let items = [1, 2, 3, 4, 5];
let sum = 0;
for item in items {
sum += item;
}
// Conditionals
if sum > 10 {
"Large sum"
} else {
"Small sum"
}
// Maps (objects)
let config = #{
debug: true,
limit: 100
};
// Tool calls (registered functions)
let content = read_file("README.md");
let files = list_directory("src");
```
## Feature Flags
| `native` | Yes | Thread-safe with `Arc<Mutex>` (for native Rust) |
| `wasm` | No | Single-threaded with `Rc<RefCell>` (for browser/Node.js) |
## Testing
### Native Tests
```bash
# Run all native tests
cargo test
# Run with verbose output
cargo test -- --nocapture
```
### WASM Tests
WASM tests require `wasm-pack`. Install it with:
```bash
cargo install wasm-pack
```
Run WASM tests:
```bash
# Test with Node.js (fastest)
wasm-pack test --node --features wasm --no-default-features
# Test with headless Chrome
wasm-pack test --headless --chrome --features wasm --no-default-features
# Test with headless Firefox
wasm-pack test --headless --firefox --features wasm --no-default-features
```
### Test Coverage
The test suite includes:
**Native tests (39 tests)**
- Orchestrator creation and configuration
- Tool registration and execution
- Script compilation and execution
- Error handling (compilation errors, tool errors, runtime errors)
- Execution limits (max operations, max tool calls, timeout)
- JSON type conversion
- Loop and conditional execution
- Timing and metrics recording
**WASM tests (25 tests)**
- ExecutionLimits constructors and setters
- WasmOrchestrator creation
- Script execution (simple, loops, conditionals, functions)
- Tool registration and execution
- JavaScript callback integration
- Error handling (compilation, runtime, tool errors)
- Max operations and tool call limits
- Complex data structures (arrays, maps, nested)
## Integration Example
The orchestrator integrates with AI agents via a tool definition:
```rust
// Register as "execute_script" tool for the LLM
Tool {
name: "execute_script",
description: "Execute a Rhai script for programmatic tool orchestration.
Write code that calls registered tools, processes results,
and returns only the final output. Use loops for batch
operations, conditionals for branching logic.",
input_schema: /* script parameter */,
requires_approval: false, // Scripts are sandboxed
}
```
When the LLM needs to perform multi-step operations, it writes a Rhai script instead of making sequential individual tool calls. The script executes locally, and only the final result enters the context window.
## Instructing LLMs to Generate Rhai
To use programmatic tool calling, your LLM needs to know how to write Rhai scripts. Include something like this in your system prompt:
### System Prompt Template
```
You have access to a script execution tool that runs Rhai code. When you need to:
- Call multiple tools in sequence
- Process data from tool results
- Loop over items or aggregate results
- Apply conditional logic based on tool outputs
Write a Rhai script instead of making individual tool calls.
## Rhai Syntax Quick Reference
Variables and types:
let x = 42; // integer
let name = "hello"; // string
let items = [1, 2, 3]; // array
let config = #{ key: "value" }; // map (object)
String interpolation (use backticks):
let msg = `Hello, ${name}!`;
let result = `Found ${items.len()} items`;
Loops:
for item in items { /* body */ }
for i in 0..10 { /* 0 to 9 */ }
Conditionals:
if x > 5 { "big" } else { "small" }
String methods:
s.len(), s.contains("x"), s.starts_with("x"), s.ends_with("x")
s.split(","), s.trim(), s.to_upper(), s.to_lower()
s.sub_string(start, len), s.index_of("x")
Array methods:
arr.push(item), arr.len(), arr.pop()
Parsing:
"42".parse_int(), "3.14".parse_float()
Available tools (call as functions):
{TOOL_LIST}
## Important Rules
1. The LAST expression in your script is the return value
2. Use string interpolation with backticks for output: `Result: ${value}`
3. Process data locally - don't return intermediate results
4. Only return the final summary/answer
## Example
Task: Get total expenses for employees 1-3
Script:
let total = 0;
for id in [1, 2, 3] {
let expenses = get_expenses(id); // Returns JSON array
// Parse and sum (simplified)
total += expenses.len() * 100; // Estimate
}
`Total across 3 employees: $${total}`
```
### Rhai Syntax Cheatsheet for LLMs
| **Variables** | `let x = 5;` | Immutable by default |
| **Mutable** | `let x = 5; x = 10;` | Can reassign |
| **Strings** | `"hello"` or `` `hello` `` | Backticks allow interpolation |
| **Interpolation** | `` `Value: ${x}` `` | Only in backtick strings |
| **Arrays** | `[1, 2, 3]` | Dynamic, mixed types OK |
| **Maps** | `#{ a: 1, b: 2 }` | Like JSON objects |
| **For loops** | `for x in arr { }` | Iterates over arrays |
| **Ranges** | `for i in 0..5 { }` | 0, 1, 2, 3, 4 |
| **If/else** | `if x > 5 { a } else { b }` | Expression-based |
| **Functions** | `fn add(a, b) { a + b }` | Last expr is return |
| **Tool calls** | `tool_name(arg)` | Registered tools are functions |
| **Comments** | `// comment` | Single line |
| **Unit (null)** | `()` | Like None/null |
## Related Projects
- **[open-ptc-agent](https://github.com/Chen-zexi/open-ptc-agent)** - Python implementation using Daytona sandbox
- **[LangChain DeepAgents](https://github.com/langchain-ai/deepagents)** - LangChain's agent framework with code execution
## Acknowledgements
This project implements patterns from:
- [Anthropic's Advanced Tool Use](https://www.anthropic.com/engineering/advanced-tool-use) - The original Programmatic Tool Calling concept
- [Rhai](https://rhai.rs/) - The embedded scripting engine that makes this possible
## License
MIT