Tool Orchestrator - Universal Programmatic Tool Calling

A model-agnostic implementation of Anthropic's Programmatic Tool Calling pattern. Instead of sequential tool calls consuming tokens, any LLM writes Rhai scripts that orchestrate multiple tools efficiently.

Background: The Problem with Traditional Tool Calling

Traditional AI tool calling follows a request-response pattern:

LLM: "Call get_expenses(employee_id=1)"
→ Returns 100 expense items to context
LLM: "Call get_expenses(employee_id=2)"
→ Returns 100 more items to context
... (20 employees later)
→ 2,000+ line items polluting the context window
→ 110,000+ tokens just to produce a summary

Each intermediate result floods the model's context window, wasting tokens and degrading performance.

Anthropic's Solution: Programmatic Tool Calling

In November 2024, Anthropic introduced Programmatic Tool Calling (PTC) as part of their advanced tool use features. The key insight:

LLMs excel at writing code. Instead of reasoning through one tool call at a time, let them write code that orchestrates entire workflows.

Their approach:

Claude writes Python code that calls multiple tools
Code executes in Anthropic's managed sandbox
Only the final result returns to the context window

Results: 37-98% token reduction, lower latency, more reliable control flow.

References

Introducing advanced tool use on the Claude Developer Platform - Anthropic Engineering Blog
CodeAct: Executable Code Actions Elicit Better LLM Agents - Academic research on code-based tool orchestration

Why This Crate? Universal Access

Anthropic's implementation has constraints:

Claude-only: Requires Claude 4.5 with the advanced-tool-use-2025-11-20 beta header
Python-only: Scripts must be Python
Anthropic-hosted: Execution happens in their managed sandbox
API-dependent: Requires their code execution tool to be enabled

Tool Orchestrator provides the same benefits for any LLM provider:

Constraint	Anthropic's PTC	Tool Orchestrator
Model	Claude 4.5 only	Any LLM that can write code
Language	Python	Rhai (Rust-like, easy for LLMs)
Execution	Anthropic's sandbox	Your local process
Runtime	Server-side (their servers)	Client-side (your control)
Dependencies	API call + beta header	Pure Rust, zero runtime deps
Targets	Python environments	Native Rust + WASM (browser/Node.js)

Supported LLM Providers

Claude (all versions, not just 4.5)
OpenAI (GPT-4, GPT-4o, o1, etc.)
Google (Gemini Pro, etc.)
Anthropic competitors (Mistral, Cohere, etc.)
Local models (Ollama, llama.cpp, vLLM)
Any future provider

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                     TRADITIONAL APPROACH                        │
│                                                                 │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│                     (tokens multiply rapidly)                   │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                 PROGRAMMATIC TOOL CALLING                       │
│                                                                 │
│  LLM writes script:                                             │
│  ┌──────────────────────────────────────┐                      │
│  │ let results = [];                     │                      │
│  │ for id in employee_ids {              │   Executes locally   │
│  │   let expenses = get_expenses(id);    │ ─────────────────→   │
│  │   let flagged = expenses.filter(...); │   Tools called       │
│  │   results.push(flagged);              │   in sandbox         │
│  │ }                                     │                      │
│  │ summarize(results)  // Only this      │ ←─────────────────   │
│  └──────────────────────────────────────┘   returns to LLM     │
└─────────────────────────────────────────────────────────────────┘

Register tools - Your actual tool implementations (file I/O, APIs, etc.)
LLM writes script - Any LLM generates a Rhai script orchestrating those tools
Sandboxed execution - Script runs locally with configurable safety limits
Minimal context - Only the final result enters the conversation

Multi-Target Architecture

This crate produces two outputs from a single codebase:

Target	Description	Use Case
Rust Library	Native Rust crate with `Arc<Mutex>` thread safety	CLI tools, server-side apps, native integrations
WASM Package	Browser/Node.js module with `Rc<RefCell>`	Web apps, npm packages, browser-based AI

Benefits

37-98% token reduction - Intermediate results stay in sandbox, only final output returns
Batch operations - Process thousands of items in loops without context pollution
Conditional logic - if/else based on tool results, handled in code not LLM reasoning
Data transformation - Filter, aggregate, transform between tool calls
Explicit control flow - Loops, error handling, retries are code, not implicit reasoning
Model agnostic - Works with any LLM that can write Rhai/Rust-like code
Audit trail - Every tool call is recorded with timing and results

Installation & Building

Rust Library (default)

# Add to Cargo.toml
cargo add tool-orchestrator

# Or build from source
cargo build

WASM Package

# Build for web (browser)
wasm-pack build --target web --features wasm --no-default-features

# Build for Node.js
wasm-pack build --target nodejs --features wasm --no-default-features

# The package is generated in ./pkg/

Usage

Rust Library

use tool_orchestrator::{ToolOrchestrator, ExecutionLimits};

// Create orchestrator
let mut orchestrator = ToolOrchestrator::new();

// Register tools as executor functions
orchestrator.register_executor("read_file", |input| {
    let path = input.get("path").and_then(|v| v.as_str()).unwrap_or("");
    std::fs::read_to_string(path).map_err(|e| e.to_string())
});

orchestrator.register_executor("list_directory", |input| {
    let path = input.get("path").and_then(|v| v.as_str()).unwrap_or(".");
    let entries: Vec<String> = std::fs::read_dir(path)
        .map_err(|e| e.to_string())?
        .filter_map(|e| e.ok().map(|e| e.path().display().to_string()))
        .collect();
    Ok(entries.join("\n"))
});

// Execute a Rhai script (written by any LLM)
let script = r#"
    let files = list_directory("src");
    let rust_files = [];
    for file in files.split("\n") {
        if file.ends_with(".rs") {
            rust_files.push(file);
        }
    }
    `Found ${rust_files.len()} Rust files: ${rust_files}`
"#;

let result = orchestrator.execute(script, ExecutionLimits::default())?;
println!("Output: {}", result.output);           // Final result only
println!("Tool calls: {:?}", result.tool_calls); // Audit trail

WASM (JavaScript/TypeScript)

import init, { WasmOrchestrator, ExecutionLimits } from 'tool-orchestrator';

await init();

const orchestrator = new WasmOrchestrator();

// Register a JavaScript function as a tool
orchestrator.register_tool('get_weather', (inputJson: string) => {
  const input = JSON.parse(inputJson);
  // Your implementation here
  return JSON.stringify({ temp: 72, condition: 'sunny' });
});

// Execute a Rhai script
const limits = new ExecutionLimits();
const result = orchestrator.execute(`
  let weather = get_weather("San Francisco");
  \`Current weather: \${weather}\`
`, limits);

console.log(result);
// { success: true, output: "Current weather: ...", tool_calls: [...] }

Safety & Sandboxing

The orchestrator includes built-in limits to prevent runaway scripts:

Limit	Default	Description
`max_operations`	100,000	Prevents infinite loops
`max_tool_calls`	50	Limits tool invocations
`timeout_ms`	30,000	Execution timeout
`max_string_size`	10MB	Maximum string length
`max_array_size`	10,000	Maximum array elements

// Preset profiles
let quick = ExecutionLimits::quick();      // 10k ops, 10 calls, 5s
let extended = ExecutionLimits::extended(); // 500k ops, 100 calls, 2m

// Custom limits
let limits = ExecutionLimits::default()
    .with_max_operations(50_000)
    .with_max_tool_calls(25)
    .with_timeout_ms(10_000);

Security Considerations

What the Sandbox Prevents

Rhai scripts executed by this crate cannot:

Access the filesystem (no std::fs, no file I/O)
Make network requests (no sockets, no HTTP)
Execute shell commands (no std::process)
Access environment variables
Spawn threads or processes
Access raw memory or use unsafe code

The only way scripts interact with the outside world is through explicitly registered tools.

Tool Implementer Responsibility

You are responsible for the security of tools you register. If you register a tool that:

Executes shell commands → scripts can run arbitrary commands
Reads/writes files → scripts can access your filesystem
Makes HTTP requests → scripts can exfiltrate data

Design your tools with the principle of least privilege.

Timeout Behavior

The timeout_ms limit uses Rhai's on_progress callback for real-time enforcement:

Timeout is checked after every Rhai operation (not just at the end)
CPU-intensive loops will be terminated mid-execution when timeout is exceeded
Note: Timeout checks don't occur during a tool call - if a registered tool blocks for 10 seconds, that time isn't interruptible
For tools that may block, implement your own timeouts within the tool executor

Recommended Limits for Untrusted Input

let limits = ExecutionLimits::default()
    .with_max_operations(10_000)      // Tight loop protection
    .with_max_tool_calls(5)           // Limit external interactions
    .with_timeout_ms(5_000)           // 5 second max
    .with_max_string_size(100_000)    // 100KB strings
    .with_max_array_size(1_000);      // 1K elements

Why Rhai Instead of Python?

Anthropic uses Python because Claude is trained extensively on it. We chose Rhai for different reasons:

Factor	Python	Rhai
Safety	Requires heavy sandboxing	Sandboxed by design, no filesystem/network access
Embedding	CPython runtime (large)	Pure Rust, compiles into your binary
WASM	Complex (Pyodide, etc.)	Native WASM support
Syntax	Python-specific	Rust-like (familiar to many LLMs)
Performance	Interpreter overhead	Optimized for embedding
Dependencies	Python ecosystem	Zero runtime dependencies

LLMs have no trouble generating Rhai - it's syntactically similar to Rust/JavaScript:

// Variables
let x = 42;
let name = "Claude";

// String interpolation (backticks)
let greeting = `Hello, ${name}!`;

// Arrays and loops
let items = [1, 2, 3, 4, 5];
let sum = 0;
for item in items {
    sum += item;
}

// Conditionals
if sum > 10 {
    "Large sum"
} else {
    "Small sum"
}

// Maps (objects)
let config = #{
    debug: true,
    limit: 100
};

// Tool calls (registered functions)
let content = read_file("README.md");
let files = list_directory("src");

Feature Flags

Feature	Default	Description
`native`	Yes	Thread-safe with `Arc<Mutex>` (for native Rust)
`wasm`	No	Single-threaded with `Rc<RefCell>` (for browser/Node.js)

Testing

Native Tests

# Run all native tests
cargo test

# Run with verbose output
cargo test -- --nocapture

WASM Tests

WASM tests require wasm-pack. Install it with:

cargo install wasm-pack

Run WASM tests:

# Test with Node.js (fastest)
wasm-pack test --node --features wasm --no-default-features

# Test with headless Chrome
wasm-pack test --headless --chrome --features wasm --no-default-features

# Test with headless Firefox
wasm-pack test --headless --firefox --features wasm --no-default-features

Test Coverage

The test suite includes:

Native tests (39 tests)

Orchestrator creation and configuration
Tool registration and execution
Script compilation and execution
Error handling (compilation errors, tool errors, runtime errors)
Execution limits (max operations, max tool calls, timeout)
JSON type conversion
Loop and conditional execution
Timing and metrics recording

WASM tests (25 tests)

ExecutionLimits constructors and setters
WasmOrchestrator creation
Script execution (simple, loops, conditionals, functions)
Tool registration and execution
JavaScript callback integration
Error handling (compilation, runtime, tool errors)
Max operations and tool call limits
Complex data structures (arrays, maps, nested)

Integration Example

The orchestrator integrates with AI agents via a tool definition:

// Register as "execute_script" tool for the LLM
Tool {
    name: "execute_script",
    description: "Execute a Rhai script for programmatic tool orchestration.
                  Write code that calls registered tools, processes results,
                  and returns only the final output. Use loops for batch
                  operations, conditionals for branching logic.",
    input_schema: /* script parameter */,
    requires_approval: false,  // Scripts are sandboxed
}

When the LLM needs to perform multi-step operations, it writes a Rhai script instead of making sequential individual tool calls. The script executes locally, and only the final result enters the context window.

Instructing LLMs to Generate Rhai

To use programmatic tool calling, your LLM needs to know how to write Rhai scripts. Include something like this in your system prompt:

System Prompt Template

You have access to a script execution tool that runs Rhai code. When you need to:
- Call multiple tools in sequence
- Process data from tool results
- Loop over items or aggregate results
- Apply conditional logic based on tool outputs

Write a Rhai script instead of making individual tool calls.

## Rhai Syntax Quick Reference

Variables and types:
  let x = 42;                    // integer
  let name = "hello";            // string
  let items = [1, 2, 3];         // array
  let config = #{ key: "value" }; // map (object)

String interpolation (use backticks):
  let msg = `Hello, ${name}!`;
  let result = `Found ${items.len()} items`;

Loops:
  for item in items { /* body */ }
  for i in 0..10 { /* 0 to 9 */ }

Conditionals:
  if x > 5 { "big" } else { "small" }

String methods:
  s.len(), s.contains("x"), s.starts_with("x"), s.ends_with("x")
  s.split(","), s.trim(), s.to_upper(), s.to_lower()
  s.sub_string(start, len), s.index_of("x")

Array methods:
  arr.push(item), arr.len(), arr.pop()
  arr.filter(|x| x > 5), arr.map(|x| x * 2)

Parsing:
  "42".parse_int(), "3.14".parse_float()

Available tools (call as functions):
  {TOOL_LIST}

## Important Rules

1. The LAST expression in your script is the return value
2. Use string interpolation with backticks for output: `Result: ${value}`
3. Process data locally - don't return intermediate results
4. Only return the final summary/answer

## Example

Task: Get total expenses for employees 1-3

Script:
let total = 0;
for id in [1, 2, 3] {
    let expenses = get_expenses(id);  // Returns JSON array
    // Parse and sum (simplified)
    total += expenses.len() * 100;    // Estimate
}
`Total across 3 employees: $${total}`

Rhai Syntax Cheatsheet for LLMs

Concept	Rhai Syntax	Notes
Variables	`let x = 5;`	Immutable by default
Mutable	`let x = 5; x = 10;`	Can reassign
Strings	`"hello"` or `hello`	Backticks allow interpolation
Interpolation	`Value: ${x}`	Only in backtick strings
Arrays	`[1, 2, 3]`	Dynamic, mixed types OK
Maps	`#{ a: 1, b: 2 }`	Like JSON objects
For loops	`for x in arr { }`	Iterates over arrays
Ranges	`for i in 0..5 { }`	0, 1, 2, 3, 4
If/else	`if x > 5 { a } else { b }`	Expression-based
Functions	`fn add(a, b) { a + b }`	Last expr is return
Tool calls	`tool_name(arg)`	Registered tools are functions
Comments	`// comment`	Single line
Unit (null)	`()`	Like None/null

Related Projects

open-ptc-agent - Python implementation using Daytona sandbox
LangChain DeepAgents - LangChain's agent framework with code execution

Acknowledgements

This project implements patterns from:

Anthropic's Advanced Tool Use - The original Programmatic Tool Calling concept
Rhai - The embedded scripting engine that makes this possible

License

MIT

tool-orchestrator 1.0.0