tool-orchestrator 1.0.0

Rhai-based tool orchestration for AI agents - implements Anthropic's programmatic tool calling pattern
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
# Tool Orchestrator - Universal Programmatic Tool Calling

[![Tests](https://img.shields.io/badge/tests-64%20passing-brightgreen)](https://github.com/anthropics/tool-orchestrator)
[![Coverage](https://img.shields.io/badge/coverage-92.59%25-brightgreen)](https://github.com/anthropics/tool-orchestrator)
[![Rust](https://img.shields.io/badge/rust-2024%20edition-orange)](https://www.rust-lang.org/)

A model-agnostic implementation of Anthropic's [Programmatic Tool Calling](https://www.anthropic.com/engineering/advanced-tool-use) pattern. Instead of sequential tool calls consuming tokens, any LLM writes Rhai scripts that orchestrate multiple tools efficiently.

## Background: The Problem with Traditional Tool Calling

Traditional AI tool calling follows a request-response pattern:

```
LLM: "Call get_expenses(employee_id=1)"
→ Returns 100 expense items to context
LLM: "Call get_expenses(employee_id=2)"
→ Returns 100 more items to context
... (20 employees later)
→ 2,000+ line items polluting the context window
→ 110,000+ tokens just to produce a summary
```

Each intermediate result floods the model's context window, wasting tokens and degrading performance.

## Anthropic's Solution: Programmatic Tool Calling

In November 2024, Anthropic introduced [Programmatic Tool Calling (PTC)](https://www.anthropic.com/engineering/advanced-tool-use) as part of their advanced tool use features. The key insight:

> **LLMs excel at writing code.** Instead of reasoning through one tool call at a time, let them write code that orchestrates entire workflows.

Their approach:
1. Claude writes Python code that calls multiple tools
2. Code executes in Anthropic's managed sandbox
3. Only the final result returns to the context window

**Results:** 37-98% token reduction, lower latency, more reliable control flow.

### References

- [Introducing advanced tool use on the Claude Developer Platform]https://www.anthropic.com/engineering/advanced-tool-use - Anthropic Engineering Blog
- [CodeAct: Executable Code Actions Elicit Better LLM Agents]https://arxiv.org/abs/2402.01030 - Academic research on code-based tool orchestration

## Why This Crate? Universal Access

Anthropic's implementation has constraints:
- **Claude-only**: Requires Claude 4.5 with the `advanced-tool-use-2025-11-20` beta header
- **Python-only**: Scripts must be Python
- **Anthropic-hosted**: Execution happens in their managed sandbox
- **API-dependent**: Requires their code execution tool to be enabled

**Tool Orchestrator** provides the same benefits for **any LLM provider**:

| Constraint | Anthropic's PTC | Tool Orchestrator |
|------------|-----------------|-------------------|
| **Model** | Claude 4.5 only | Any LLM that can write code |
| **Language** | Python | Rhai (Rust-like, easy for LLMs) |
| **Execution** | Anthropic's sandbox | Your local process |
| **Runtime** | Server-side (their servers) | Client-side (your control) |
| **Dependencies** | API call + beta header | Pure Rust, zero runtime deps |
| **Targets** | Python environments | Native Rust + WASM (browser/Node.js) |

### Supported LLM Providers

- Claude (all versions, not just 4.5)
- OpenAI (GPT-4, GPT-4o, o1, etc.)
- Google (Gemini Pro, etc.)
- Anthropic competitors (Mistral, Cohere, etc.)
- Local models (Ollama, llama.cpp, vLLM)
- Any future provider

## How It Works

```
┌─────────────────────────────────────────────────────────────────┐
│                     TRADITIONAL APPROACH                        │
│                                                                 │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│                     (tokens multiply rapidly)                   │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                 PROGRAMMATIC TOOL CALLING                       │
│                                                                 │
│  LLM writes script:                                             │
│  ┌──────────────────────────────────────┐                      │
│  │ let results = [];                     │                      │
│  │ for id in employee_ids {              │   Executes locally   │
│  │   let expenses = get_expenses(id);    │ ─────────────────→   │
│  │   let flagged = expenses.filter(...); │   Tools called       │
│  │   results.push(flagged);              │   in sandbox         │
│  │ }                                     │                      │
│  │ summarize(results)  // Only this      │ ←─────────────────   │
│  └──────────────────────────────────────┘   returns to LLM     │
└─────────────────────────────────────────────────────────────────┘
```

1. **Register tools** - Your actual tool implementations (file I/O, APIs, etc.)
2. **LLM writes script** - Any LLM generates a Rhai script orchestrating those tools
3. **Sandboxed execution** - Script runs locally with configurable safety limits
4. **Minimal context** - Only the final result enters the conversation

## Multi-Target Architecture

This crate produces **two outputs** from a single codebase:

| Target | Description | Use Case |
|--------|-------------|----------|
| **Rust Library** | Native Rust crate with `Arc<Mutex>` thread safety | CLI tools, server-side apps, native integrations |
| **WASM Package** | Browser/Node.js module with `Rc<RefCell>` | Web apps, npm packages, browser-based AI |

## Benefits

- **37-98% token reduction** - Intermediate results stay in sandbox, only final output returns
- **Batch operations** - Process thousands of items in loops without context pollution
- **Conditional logic** - if/else based on tool results, handled in code not LLM reasoning
- **Data transformation** - Filter, aggregate, transform between tool calls
- **Explicit control flow** - Loops, error handling, retries are code, not implicit reasoning
- **Model agnostic** - Works with any LLM that can write Rhai/Rust-like code
- **Audit trail** - Every tool call is recorded with timing and results

## Installation & Building

### Rust Library (default)

```bash
# Add to Cargo.toml
cargo add tool-orchestrator

# Or build from source
cargo build
```

### WASM Package

```bash
# Build for web (browser)
wasm-pack build --target web --features wasm --no-default-features

# Build for Node.js
wasm-pack build --target nodejs --features wasm --no-default-features

# The package is generated in ./pkg/
```

## Usage

### Rust Library

```rust
use tool_orchestrator::{ToolOrchestrator, ExecutionLimits};

// Create orchestrator
let mut orchestrator = ToolOrchestrator::new();

// Register tools as executor functions
orchestrator.register_executor("read_file", |input| {
    let path = input.get("path").and_then(|v| v.as_str()).unwrap_or("");
    std::fs::read_to_string(path).map_err(|e| e.to_string())
});

orchestrator.register_executor("list_directory", |input| {
    let path = input.get("path").and_then(|v| v.as_str()).unwrap_or(".");
    let entries: Vec<String> = std::fs::read_dir(path)
        .map_err(|e| e.to_string())?
        .filter_map(|e| e.ok().map(|e| e.path().display().to_string()))
        .collect();
    Ok(entries.join("\n"))
});

// Execute a Rhai script (written by any LLM)
let script = r#"
    let files = list_directory("src");
    let rust_files = [];
    for file in files.split("\n") {
        if file.ends_with(".rs") {
            rust_files.push(file);
        }
    }
    `Found ${rust_files.len()} Rust files: ${rust_files}`
"#;

let result = orchestrator.execute(script, ExecutionLimits::default())?;
println!("Output: {}", result.output);           // Final result only
println!("Tool calls: {:?}", result.tool_calls); // Audit trail
```

### WASM (JavaScript/TypeScript)

```typescript
import init, { WasmOrchestrator, ExecutionLimits } from 'tool-orchestrator';

await init();

const orchestrator = new WasmOrchestrator();

// Register a JavaScript function as a tool
orchestrator.register_tool('get_weather', (inputJson: string) => {
  const input = JSON.parse(inputJson);
  // Your implementation here
  return JSON.stringify({ temp: 72, condition: 'sunny' });
});

// Execute a Rhai script
const limits = new ExecutionLimits();
const result = orchestrator.execute(`
  let weather = get_weather("San Francisco");
  \`Current weather: \${weather}\`
`, limits);

console.log(result);
// { success: true, output: "Current weather: ...", tool_calls: [...] }
```

## Safety & Sandboxing

The orchestrator includes built-in limits to prevent runaway scripts:

| Limit | Default | Description |
|-------|---------|-------------|
| `max_operations` | 100,000 | Prevents infinite loops |
| `max_tool_calls` | 50 | Limits tool invocations |
| `timeout_ms` | 30,000 | Execution timeout |
| `max_string_size` | 10MB | Maximum string length |
| `max_array_size` | 10,000 | Maximum array elements |

```rust
// Preset profiles
let quick = ExecutionLimits::quick();      // 10k ops, 10 calls, 5s
let extended = ExecutionLimits::extended(); // 500k ops, 100 calls, 2m

// Custom limits
let limits = ExecutionLimits::default()
    .with_max_operations(50_000)
    .with_max_tool_calls(25)
    .with_timeout_ms(10_000);
```

## Security Considerations

### What the Sandbox Prevents

Rhai scripts executed by this crate **cannot**:
- Access the filesystem (no `std::fs`, no file I/O)
- Make network requests (no sockets, no HTTP)
- Execute shell commands (no `std::process`)
- Access environment variables
- Spawn threads or processes
- Access raw memory or use unsafe code

The only way scripts interact with the outside world is through **explicitly registered tools**.

### Tool Implementer Responsibility

**You are responsible for the security of tools you register.** If you register a tool that:
- Executes shell commands → scripts can run arbitrary commands
- Reads/writes files → scripts can access your filesystem
- Makes HTTP requests → scripts can exfiltrate data

Design your tools with the principle of least privilege.

### Timeout Behavior

The `timeout_ms` limit uses Rhai's `on_progress` callback for **real-time enforcement**:
- Timeout is checked after every Rhai operation (not just at the end)
- CPU-intensive loops will be terminated mid-execution when timeout is exceeded
- **Note:** Timeout checks don't occur *during* a tool call - if a registered tool blocks for 10 seconds, that time isn't interruptible
- For tools that may block, implement your own timeouts within the tool executor

### Recommended Limits for Untrusted Input

```rust
let limits = ExecutionLimits::default()
    .with_max_operations(10_000)      // Tight loop protection
    .with_max_tool_calls(5)           // Limit external interactions
    .with_timeout_ms(5_000)           // 5 second max
    .with_max_string_size(100_000)    // 100KB strings
    .with_max_array_size(1_000);      // 1K elements
```

## Why Rhai Instead of Python?

Anthropic uses Python because Claude is trained extensively on it. We chose [Rhai](https://rhai.rs/) for different reasons:

| Factor | Python | Rhai |
|--------|--------|------|
| **Safety** | Requires heavy sandboxing | Sandboxed by design, no filesystem/network access |
| **Embedding** | CPython runtime (large) | Pure Rust, compiles into your binary |
| **WASM** | Complex (Pyodide, etc.) | Native WASM support |
| **Syntax** | Python-specific | Rust-like (familiar to many LLMs) |
| **Performance** | Interpreter overhead | Optimized for embedding |
| **Dependencies** | Python ecosystem | Zero runtime dependencies |

LLMs have no trouble generating Rhai - it's syntactically similar to Rust/JavaScript:

```rhai
// Variables
let x = 42;
let name = "Claude";

// String interpolation (backticks)
let greeting = `Hello, ${name}!`;

// Arrays and loops
let items = [1, 2, 3, 4, 5];
let sum = 0;
for item in items {
    sum += item;
}

// Conditionals
if sum > 10 {
    "Large sum"
} else {
    "Small sum"
}

// Maps (objects)
let config = #{
    debug: true,
    limit: 100
};

// Tool calls (registered functions)
let content = read_file("README.md");
let files = list_directory("src");
```

## Feature Flags

| Feature | Default | Description |
|---------|---------|-------------|
| `native` | Yes | Thread-safe with `Arc<Mutex>` (for native Rust) |
| `wasm` | No | Single-threaded with `Rc<RefCell>` (for browser/Node.js) |

## Testing

### Native Tests

```bash
# Run all native tests
cargo test

# Run with verbose output
cargo test -- --nocapture
```

### WASM Tests

WASM tests require `wasm-pack`. Install it with:

```bash
cargo install wasm-pack
```

Run WASM tests:

```bash
# Test with Node.js (fastest)
wasm-pack test --node --features wasm --no-default-features

# Test with headless Chrome
wasm-pack test --headless --chrome --features wasm --no-default-features

# Test with headless Firefox
wasm-pack test --headless --firefox --features wasm --no-default-features
```

### Test Coverage

The test suite includes:

**Native tests (39 tests)**
- Orchestrator creation and configuration
- Tool registration and execution
- Script compilation and execution
- Error handling (compilation errors, tool errors, runtime errors)
- Execution limits (max operations, max tool calls, timeout)
- JSON type conversion
- Loop and conditional execution
- Timing and metrics recording

**WASM tests (25 tests)**
- ExecutionLimits constructors and setters
- WasmOrchestrator creation
- Script execution (simple, loops, conditionals, functions)
- Tool registration and execution
- JavaScript callback integration
- Error handling (compilation, runtime, tool errors)
- Max operations and tool call limits
- Complex data structures (arrays, maps, nested)

## Integration Example

The orchestrator integrates with AI agents via a tool definition:

```rust
// Register as "execute_script" tool for the LLM
Tool {
    name: "execute_script",
    description: "Execute a Rhai script for programmatic tool orchestration.
                  Write code that calls registered tools, processes results,
                  and returns only the final output. Use loops for batch
                  operations, conditionals for branching logic.",
    input_schema: /* script parameter */,
    requires_approval: false,  // Scripts are sandboxed
}
```

When the LLM needs to perform multi-step operations, it writes a Rhai script instead of making sequential individual tool calls. The script executes locally, and only the final result enters the context window.

## Instructing LLMs to Generate Rhai

To use programmatic tool calling, your LLM needs to know how to write Rhai scripts. Include something like this in your system prompt:

### System Prompt Template

```
You have access to a script execution tool that runs Rhai code. When you need to:
- Call multiple tools in sequence
- Process data from tool results
- Loop over items or aggregate results
- Apply conditional logic based on tool outputs

Write a Rhai script instead of making individual tool calls.

## Rhai Syntax Quick Reference

Variables and types:
  let x = 42;                    // integer
  let name = "hello";            // string
  let items = [1, 2, 3];         // array
  let config = #{ key: "value" }; // map (object)

String interpolation (use backticks):
  let msg = `Hello, ${name}!`;
  let result = `Found ${items.len()} items`;

Loops:
  for item in items { /* body */ }
  for i in 0..10 { /* 0 to 9 */ }

Conditionals:
  if x > 5 { "big" } else { "small" }

String methods:
  s.len(), s.contains("x"), s.starts_with("x"), s.ends_with("x")
  s.split(","), s.trim(), s.to_upper(), s.to_lower()
  s.sub_string(start, len), s.index_of("x")

Array methods:
  arr.push(item), arr.len(), arr.pop()
  arr.filter(|x| x > 5), arr.map(|x| x * 2)

Parsing:
  "42".parse_int(), "3.14".parse_float()

Available tools (call as functions):
  {TOOL_LIST}

## Important Rules

1. The LAST expression in your script is the return value
2. Use string interpolation with backticks for output: `Result: ${value}`
3. Process data locally - don't return intermediate results
4. Only return the final summary/answer

## Example

Task: Get total expenses for employees 1-3

Script:
let total = 0;
for id in [1, 2, 3] {
    let expenses = get_expenses(id);  // Returns JSON array
    // Parse and sum (simplified)
    total += expenses.len() * 100;    // Estimate
}
`Total across 3 employees: $${total}`
```

### Rhai Syntax Cheatsheet for LLMs

| Concept | Rhai Syntax | Notes |
|---------|-------------|-------|
| **Variables** | `let x = 5;` | Immutable by default |
| **Mutable** | `let x = 5; x = 10;` | Can reassign |
| **Strings** | `"hello"` or `` `hello` `` | Backticks allow interpolation |
| **Interpolation** | `` `Value: ${x}` `` | Only in backtick strings |
| **Arrays** | `[1, 2, 3]` | Dynamic, mixed types OK |
| **Maps** | `#{ a: 1, b: 2 }` | Like JSON objects |
| **For loops** | `for x in arr { }` | Iterates over arrays |
| **Ranges** | `for i in 0..5 { }` | 0, 1, 2, 3, 4 |
| **If/else** | `if x > 5 { a } else { b }` | Expression-based |
| **Functions** | `fn add(a, b) { a + b }` | Last expr is return |
| **Tool calls** | `tool_name(arg)` | Registered tools are functions |
| **Comments** | `// comment` | Single line |
| **Unit (null)** | `()` | Like None/null |

## Related Projects

- **[open-ptc-agent]https://github.com/Chen-zexi/open-ptc-agent** - Python implementation using Daytona sandbox
- **[LangChain DeepAgents]https://github.com/langchain-ai/deepagents** - LangChain's agent framework with code execution

## Acknowledgements

This project implements patterns from:

- [Anthropic's Advanced Tool Use]https://www.anthropic.com/engineering/advanced-tool-use - The original Programmatic Tool Calling concept
- [Rhai]https://rhai.rs/ - The embedded scripting engine that makes this possible

## License

MIT