Tool Orchestrator - Universal Programmatic Tool Calling
A model-agnostic implementation of Anthropic's Programmatic Tool Calling pattern. Instead of sequential tool calls consuming tokens, any LLM writes Rhai scripts that orchestrate multiple tools efficiently.
Background: The Problem with Traditional Tool Calling
Traditional AI tool calling follows a request-response pattern:
LLM: "Call get_expenses(employee_id=1)"
→ Returns 100 expense items to context
LLM: "Call get_expenses(employee_id=2)"
→ Returns 100 more items to context
... (20 employees later)
→ 2,000+ line items polluting the context window
→ 110,000+ tokens just to produce a summary
Each intermediate result floods the model's context window, wasting tokens and degrading performance.
Anthropic's Solution: Programmatic Tool Calling
In November 2024, Anthropic introduced Programmatic Tool Calling (PTC) as part of their advanced tool use features. The key insight:
LLMs excel at writing code. Instead of reasoning through one tool call at a time, let them write code that orchestrates entire workflows.
Their approach:
- Claude writes Python code that calls multiple tools
- Code executes in Anthropic's managed sandbox
- Only the final result returns to the context window
Results: 37-98% token reduction, lower latency, more reliable control flow.
References
- Introducing advanced tool use on the Claude Developer Platform - Anthropic Engineering Blog
- CodeAct: Executable Code Actions Elicit Better LLM Agents - Academic research on code-based tool orchestration
Why This Crate? Universal Access
Anthropic's implementation has constraints:
- Claude-only: Requires Claude 4.5 with the
advanced-tool-use-2025-11-20beta header - Python-only: Scripts must be Python
- Anthropic-hosted: Execution happens in their managed sandbox
- API-dependent: Requires their code execution tool to be enabled
Tool Orchestrator provides the same benefits for any LLM provider:
| Constraint | Anthropic's PTC | Tool Orchestrator |
|---|---|---|
| Model | Claude 4.5 only | Any LLM that can write code |
| Language | Python | Rhai (Rust-like, easy for LLMs) |
| Execution | Anthropic's sandbox | Your local process |
| Runtime | Server-side (their servers) | Client-side (your control) |
| Dependencies | API call + beta header | Pure Rust, zero runtime deps |
| Targets | Python environments | Native Rust + WASM (browser/Node.js) |
Supported LLM Providers
- Claude (all versions, not just 4.5)
- OpenAI (GPT-4, GPT-4o, o1, etc.)
- Google (Gemini Pro, etc.)
- Anthropic competitors (Mistral, Cohere, etc.)
- Local models (Ollama, llama.cpp, vLLM)
- Any future provider
How It Works
┌─────────────────────────────────────────────────────────────────┐
│ TRADITIONAL APPROACH │
│ │
│ LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons │
│ LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons │
│ LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons │
│ (tokens multiply rapidly) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PROGRAMMATIC TOOL CALLING │
│ │
│ LLM writes script: │
│ ┌──────────────────────────────────────┐ │
│ │ let results = []; │ │
│ │ for id in employee_ids { │ Executes locally │
│ │ let expenses = get_expenses(id); │ ─────────────────→ │
│ │ let flagged = expenses.filter(...); │ Tools called │
│ │ results.push(flagged); │ in sandbox │
│ │ } │ │
│ │ summarize(results) // Only this │ ←───────────────── │
│ └──────────────────────────────────────┘ returns to LLM │
└─────────────────────────────────────────────────────────────────┘
- Register tools - Your actual tool implementations (file I/O, APIs, etc.)
- LLM writes script - Any LLM generates a Rhai script orchestrating those tools
- Sandboxed execution - Script runs locally with configurable safety limits
- Minimal context - Only the final result enters the conversation
Multi-Target Architecture
This crate produces two outputs from a single codebase:
| Target | Description | Use Case |
|---|---|---|
| Rust Library | Native Rust crate with Arc<Mutex> thread safety |
CLI tools, server-side apps, native integrations |
| WASM Package | Browser/Node.js module with Rc<RefCell> |
Web apps, npm packages, browser-based AI |
Benefits
- 37-98% token reduction - Intermediate results stay in sandbox, only final output returns
- Batch operations - Process thousands of items in loops without context pollution
- Conditional logic - if/else based on tool results, handled in code not LLM reasoning
- Data transformation - Filter, aggregate, transform between tool calls
- Explicit control flow - Loops, error handling, retries are code, not implicit reasoning
- Model agnostic - Works with any LLM that can write Rhai/Rust-like code
- Audit trail - Every tool call is recorded with timing and results
Installation & Building
Rust Library (default)
# Add to Cargo.toml
# Or build from source
WASM Package
# Build for web (browser)
# Build for Node.js
# The package is generated in ./pkg/
Usage
Rust Library
use ;
// Create orchestrator
let mut orchestrator = new;
// Register tools as executor functions
orchestrator.register_executor;
orchestrator.register_executor;
// Execute a Rhai script (written by any LLM)
let script = r#"
let files = list_directory("src");
let rust_files = [];
for file in files.split("\n") {
if file.ends_with(".rs") {
rust_files.push(file);
}
}
`Found ${rust_files.len()} Rust files: ${rust_files}`
"#;
let result = orchestrator.execute?;
println!; // Final result only
println!; // Audit trail
WASM (JavaScript/TypeScript)
import init, { WasmOrchestrator, ExecutionLimits } from 'tool-orchestrator';
await init();
const orchestrator = new WasmOrchestrator();
// Register a JavaScript function as a tool
orchestrator.register_tool('get_weather', (inputJson: string) => {
const input = JSON.parse(inputJson);
// Your implementation here
return JSON.stringify({ temp: 72, condition: 'sunny' });
});
// Execute a Rhai script
const limits = new ExecutionLimits();
const result = orchestrator.execute(`
let weather = get_weather("San Francisco");
\`Current weather: \${weather}\`
`, limits);
console.log(result);
// { success: true, output: "Current weather: ...", tool_calls: [...] }
Safety & Sandboxing
The orchestrator includes built-in limits to prevent runaway scripts:
| Limit | Default | Description |
|---|---|---|
max_operations |
100,000 | Prevents infinite loops |
max_tool_calls |
50 | Limits tool invocations |
timeout_ms |
30,000 | Execution timeout |
max_string_size |
10MB | Maximum string length |
max_array_size |
10,000 | Maximum array elements |
// Preset profiles
let quick = quick; // 10k ops, 10 calls, 5s
let extended = extended; // 500k ops, 100 calls, 2m
// Custom limits
let limits = default
.with_max_operations
.with_max_tool_calls
.with_timeout_ms;
Security Considerations
What the Sandbox Prevents
Rhai scripts executed by this crate cannot:
- Access the filesystem (no
std::fs, no file I/O) - Make network requests (no sockets, no HTTP)
- Execute shell commands (no
std::process) - Access environment variables
- Spawn threads or processes
- Access raw memory or use unsafe code
The only way scripts interact with the outside world is through explicitly registered tools.
Tool Implementer Responsibility
You are responsible for the security of tools you register. If you register a tool that:
- Executes shell commands → scripts can run arbitrary commands
- Reads/writes files → scripts can access your filesystem
- Makes HTTP requests → scripts can exfiltrate data
Design your tools with the principle of least privilege.
Timeout Behavior
The timeout_ms limit uses Rhai's on_progress callback for real-time enforcement:
- Timeout is checked after every Rhai operation (not just at the end)
- CPU-intensive loops will be terminated mid-execution when timeout is exceeded
- Note: Timeout checks don't occur during a tool call - if a registered tool blocks for 10 seconds, that time isn't interruptible
- For tools that may block, implement your own timeouts within the tool executor
Recommended Limits for Untrusted Input
let limits = default
.with_max_operations // Tight loop protection
.with_max_tool_calls // Limit external interactions
.with_timeout_ms // 5 second max
.with_max_string_size // 100KB strings
.with_max_array_size; // 1K elements
Why Rhai Instead of Python?
Anthropic uses Python because Claude is trained extensively on it. We chose Rhai for different reasons:
| Factor | Python | Rhai |
|---|---|---|
| Safety | Requires heavy sandboxing | Sandboxed by design, no filesystem/network access |
| Embedding | CPython runtime (large) | Pure Rust, compiles into your binary |
| WASM | Complex (Pyodide, etc.) | Native WASM support |
| Syntax | Python-specific | Rust-like (familiar to many LLMs) |
| Performance | Interpreter overhead | Optimized for embedding |
| Dependencies | Python ecosystem | Zero runtime dependencies |
LLMs have no trouble generating Rhai - it's syntactically similar to Rust/JavaScript:
// Variables
let x = 42;
let name = "Claude";
// String interpolation (backticks)
let greeting = `Hello, ${name}!`;
// Arrays and loops
let items = [1, 2, 3, 4, 5];
let sum = 0;
for item in items {
sum += item;
}
// Conditionals
if sum > 10 {
"Large sum"
} else {
"Small sum"
}
// Maps (objects)
let config = #{
debug: true,
limit: 100
};
// Tool calls (registered functions)
let content = read_file("README.md");
let files = list_directory("src");
Feature Flags
| Feature | Default | Description |
|---|---|---|
native |
Yes | Thread-safe with Arc<Mutex> (for native Rust) |
wasm |
No | Single-threaded with Rc<RefCell> (for browser/Node.js) |
Testing
Native Tests
# Run all native tests
# Run with verbose output
WASM Tests
WASM tests require wasm-pack. Install it with:
Run WASM tests:
# Test with Node.js (fastest)
# Test with headless Chrome
# Test with headless Firefox
Test Coverage
The test suite includes:
Native tests (39 tests)
- Orchestrator creation and configuration
- Tool registration and execution
- Script compilation and execution
- Error handling (compilation errors, tool errors, runtime errors)
- Execution limits (max operations, max tool calls, timeout)
- JSON type conversion
- Loop and conditional execution
- Timing and metrics recording
WASM tests (25 tests)
- ExecutionLimits constructors and setters
- WasmOrchestrator creation
- Script execution (simple, loops, conditionals, functions)
- Tool registration and execution
- JavaScript callback integration
- Error handling (compilation, runtime, tool errors)
- Max operations and tool call limits
- Complex data structures (arrays, maps, nested)
Integration Example
The orchestrator integrates with AI agents via a tool definition:
// Register as "execute_script" tool for the LLM
Tool
When the LLM needs to perform multi-step operations, it writes a Rhai script instead of making sequential individual tool calls. The script executes locally, and only the final result enters the context window.
Instructing LLMs to Generate Rhai
To use programmatic tool calling, your LLM needs to know how to write Rhai scripts. Include something like this in your system prompt:
System Prompt Template
You have access to a script execution tool that runs Rhai code. When you need to:
- Call multiple tools in sequence
- Process data from tool results
- Loop over items or aggregate results
- Apply conditional logic based on tool outputs
Write a Rhai script instead of making individual tool calls.
## Rhai Syntax Quick Reference
Variables and types:
let x = 42; // integer
let name = "hello"; // string
let items = [1, 2, 3]; // array
let config = #{ key: "value" }; // map (object)
String interpolation (use backticks):
let msg = `Hello, ${name}!`;
let result = `Found ${items.len()} items`;
Loops:
for item in items { /* body */ }
for i in 0..10 { /* 0 to 9 */ }
Conditionals:
if x > 5 { "big" } else { "small" }
String methods:
s.len(), s.contains("x"), s.starts_with("x"), s.ends_with("x")
s.split(","), s.trim(), s.to_upper(), s.to_lower()
s.sub_string(start, len), s.index_of("x")
Array methods:
arr.push(item), arr.len(), arr.pop()
arr.filter(|x| x > 5), arr.map(|x| x * 2)
Parsing:
"42".parse_int(), "3.14".parse_float()
Available tools (call as functions):
{TOOL_LIST}
## Important Rules
1. The LAST expression in your script is the return value
2. Use string interpolation with backticks for output: `Result: ${value}`
3. Process data locally - don't return intermediate results
4. Only return the final summary/answer
## Example
Task: Get total expenses for employees 1-3
Script:
let total = 0;
for id in [1, 2, 3] {
let expenses = get_expenses(id); // Returns JSON array
// Parse and sum (simplified)
total += expenses.len() * 100; // Estimate
}
`Total across 3 employees: $${total}`
Rhai Syntax Cheatsheet for LLMs
| Concept | Rhai Syntax | Notes |
|---|---|---|
| Variables | let x = 5; |
Immutable by default |
| Mutable | let x = 5; x = 10; |
Can reassign |
| Strings | "hello" or `hello` |
Backticks allow interpolation |
| Interpolation | `Value: ${x}` |
Only in backtick strings |
| Arrays | [1, 2, 3] |
Dynamic, mixed types OK |
| Maps | #{ a: 1, b: 2 } |
Like JSON objects |
| For loops | for x in arr { } |
Iterates over arrays |
| Ranges | for i in 0..5 { } |
0, 1, 2, 3, 4 |
| If/else | if x > 5 { a } else { b } |
Expression-based |
| Functions | fn add(a, b) { a + b } |
Last expr is return |
| Tool calls | tool_name(arg) |
Registered tools are functions |
| Comments | // comment |
Single line |
| Unit (null) | () |
Like None/null |
Related Projects
- open-ptc-agent - Python implementation using Daytona sandbox
- LangChain DeepAgents - LangChain's agent framework with code execution
Acknowledgements
This project implements patterns from:
- Anthropic's Advanced Tool Use - The original Programmatic Tool Calling concept
- Rhai - The embedded scripting engine that makes this possible
License
MIT