tool-orchestrator 1.0.0

Rhai-based tool orchestration for AI agents - implements Anthropic's programmatic tool calling pattern
Documentation

Tool Orchestrator - Universal Programmatic Tool Calling

Tests Coverage Rust

A model-agnostic implementation of Anthropic's Programmatic Tool Calling pattern. Instead of sequential tool calls consuming tokens, any LLM writes Rhai scripts that orchestrate multiple tools efficiently.

Background: The Problem with Traditional Tool Calling

Traditional AI tool calling follows a request-response pattern:

LLM: "Call get_expenses(employee_id=1)"
→ Returns 100 expense items to context
LLM: "Call get_expenses(employee_id=2)"
→ Returns 100 more items to context
... (20 employees later)
→ 2,000+ line items polluting the context window
→ 110,000+ tokens just to produce a summary

Each intermediate result floods the model's context window, wasting tokens and degrading performance.

Anthropic's Solution: Programmatic Tool Calling

In November 2024, Anthropic introduced Programmatic Tool Calling (PTC) as part of their advanced tool use features. The key insight:

LLMs excel at writing code. Instead of reasoning through one tool call at a time, let them write code that orchestrates entire workflows.

Their approach:

  1. Claude writes Python code that calls multiple tools
  2. Code executes in Anthropic's managed sandbox
  3. Only the final result returns to the context window

Results: 37-98% token reduction, lower latency, more reliable control flow.

References

Why This Crate? Universal Access

Anthropic's implementation has constraints:

  • Claude-only: Requires Claude 4.5 with the advanced-tool-use-2025-11-20 beta header
  • Python-only: Scripts must be Python
  • Anthropic-hosted: Execution happens in their managed sandbox
  • API-dependent: Requires their code execution tool to be enabled

Tool Orchestrator provides the same benefits for any LLM provider:

Constraint Anthropic's PTC Tool Orchestrator
Model Claude 4.5 only Any LLM that can write code
Language Python Rhai (Rust-like, easy for LLMs)
Execution Anthropic's sandbox Your local process
Runtime Server-side (their servers) Client-side (your control)
Dependencies API call + beta header Pure Rust, zero runtime deps
Targets Python environments Native Rust + WASM (browser/Node.js)

Supported LLM Providers

  • Claude (all versions, not just 4.5)
  • OpenAI (GPT-4, GPT-4o, o1, etc.)
  • Google (Gemini Pro, etc.)
  • Anthropic competitors (Mistral, Cohere, etc.)
  • Local models (Ollama, llama.cpp, vLLM)
  • Any future provider

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                     TRADITIONAL APPROACH                        │
│                                                                 │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│  LLM ─→ Tool Call ─→ Full Result to Context ─→ LLM reasons     │
│                     (tokens multiply rapidly)                   │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                 PROGRAMMATIC TOOL CALLING                       │
│                                                                 │
│  LLM writes script:                                             │
│  ┌──────────────────────────────────────┐                      │
│  │ let results = [];                     │                      │
│  │ for id in employee_ids {              │   Executes locally   │
│  │   let expenses = get_expenses(id);    │ ─────────────────→   │
│  │   let flagged = expenses.filter(...); │   Tools called       │
│  │   results.push(flagged);              │   in sandbox         │
│  │ }                                     │                      │
│  │ summarize(results)  // Only this      │ ←─────────────────   │
│  └──────────────────────────────────────┘   returns to LLM     │
└─────────────────────────────────────────────────────────────────┘
  1. Register tools - Your actual tool implementations (file I/O, APIs, etc.)
  2. LLM writes script - Any LLM generates a Rhai script orchestrating those tools
  3. Sandboxed execution - Script runs locally with configurable safety limits
  4. Minimal context - Only the final result enters the conversation

Multi-Target Architecture

This crate produces two outputs from a single codebase:

Target Description Use Case
Rust Library Native Rust crate with Arc<Mutex> thread safety CLI tools, server-side apps, native integrations
WASM Package Browser/Node.js module with Rc<RefCell> Web apps, npm packages, browser-based AI

Benefits

  • 37-98% token reduction - Intermediate results stay in sandbox, only final output returns
  • Batch operations - Process thousands of items in loops without context pollution
  • Conditional logic - if/else based on tool results, handled in code not LLM reasoning
  • Data transformation - Filter, aggregate, transform between tool calls
  • Explicit control flow - Loops, error handling, retries are code, not implicit reasoning
  • Model agnostic - Works with any LLM that can write Rhai/Rust-like code
  • Audit trail - Every tool call is recorded with timing and results

Installation & Building

Rust Library (default)

# Add to Cargo.toml
cargo add tool-orchestrator

# Or build from source
cargo build

WASM Package

# Build for web (browser)
wasm-pack build --target web --features wasm --no-default-features

# Build for Node.js
wasm-pack build --target nodejs --features wasm --no-default-features

# The package is generated in ./pkg/

Usage

Rust Library

use tool_orchestrator::{ToolOrchestrator, ExecutionLimits};

// Create orchestrator
let mut orchestrator = ToolOrchestrator::new();

// Register tools as executor functions
orchestrator.register_executor("read_file", |input| {
    let path = input.get("path").and_then(|v| v.as_str()).unwrap_or("");
    std::fs::read_to_string(path).map_err(|e| e.to_string())
});

orchestrator.register_executor("list_directory", |input| {
    let path = input.get("path").and_then(|v| v.as_str()).unwrap_or(".");
    let entries: Vec<String> = std::fs::read_dir(path)
        .map_err(|e| e.to_string())?
        .filter_map(|e| e.ok().map(|e| e.path().display().to_string()))
        .collect();
    Ok(entries.join("\n"))
});

// Execute a Rhai script (written by any LLM)
let script = r#"
    let files = list_directory("src");
    let rust_files = [];
    for file in files.split("\n") {
        if file.ends_with(".rs") {
            rust_files.push(file);
        }
    }
    `Found ${rust_files.len()} Rust files: ${rust_files}`
"#;

let result = orchestrator.execute(script, ExecutionLimits::default())?;
println!("Output: {}", result.output);           // Final result only
println!("Tool calls: {:?}", result.tool_calls); // Audit trail

WASM (JavaScript/TypeScript)

import init, { WasmOrchestrator, ExecutionLimits } from 'tool-orchestrator';

await init();

const orchestrator = new WasmOrchestrator();

// Register a JavaScript function as a tool
orchestrator.register_tool('get_weather', (inputJson: string) => {
  const input = JSON.parse(inputJson);
  // Your implementation here
  return JSON.stringify({ temp: 72, condition: 'sunny' });
});

// Execute a Rhai script
const limits = new ExecutionLimits();
const result = orchestrator.execute(`
  let weather = get_weather("San Francisco");
  \`Current weather: \${weather}\`
`, limits);

console.log(result);
// { success: true, output: "Current weather: ...", tool_calls: [...] }

Safety & Sandboxing

The orchestrator includes built-in limits to prevent runaway scripts:

Limit Default Description
max_operations 100,000 Prevents infinite loops
max_tool_calls 50 Limits tool invocations
timeout_ms 30,000 Execution timeout
max_string_size 10MB Maximum string length
max_array_size 10,000 Maximum array elements
// Preset profiles
let quick = ExecutionLimits::quick();      // 10k ops, 10 calls, 5s
let extended = ExecutionLimits::extended(); // 500k ops, 100 calls, 2m

// Custom limits
let limits = ExecutionLimits::default()
    .with_max_operations(50_000)
    .with_max_tool_calls(25)
    .with_timeout_ms(10_000);

Security Considerations

What the Sandbox Prevents

Rhai scripts executed by this crate cannot:

  • Access the filesystem (no std::fs, no file I/O)
  • Make network requests (no sockets, no HTTP)
  • Execute shell commands (no std::process)
  • Access environment variables
  • Spawn threads or processes
  • Access raw memory or use unsafe code

The only way scripts interact with the outside world is through explicitly registered tools.

Tool Implementer Responsibility

You are responsible for the security of tools you register. If you register a tool that:

  • Executes shell commands → scripts can run arbitrary commands
  • Reads/writes files → scripts can access your filesystem
  • Makes HTTP requests → scripts can exfiltrate data

Design your tools with the principle of least privilege.

Timeout Behavior

The timeout_ms limit uses Rhai's on_progress callback for real-time enforcement:

  • Timeout is checked after every Rhai operation (not just at the end)
  • CPU-intensive loops will be terminated mid-execution when timeout is exceeded
  • Note: Timeout checks don't occur during a tool call - if a registered tool blocks for 10 seconds, that time isn't interruptible
  • For tools that may block, implement your own timeouts within the tool executor

Recommended Limits for Untrusted Input

let limits = ExecutionLimits::default()
    .with_max_operations(10_000)      // Tight loop protection
    .with_max_tool_calls(5)           // Limit external interactions
    .with_timeout_ms(5_000)           // 5 second max
    .with_max_string_size(100_000)    // 100KB strings
    .with_max_array_size(1_000);      // 1K elements

Why Rhai Instead of Python?

Anthropic uses Python because Claude is trained extensively on it. We chose Rhai for different reasons:

Factor Python Rhai
Safety Requires heavy sandboxing Sandboxed by design, no filesystem/network access
Embedding CPython runtime (large) Pure Rust, compiles into your binary
WASM Complex (Pyodide, etc.) Native WASM support
Syntax Python-specific Rust-like (familiar to many LLMs)
Performance Interpreter overhead Optimized for embedding
Dependencies Python ecosystem Zero runtime dependencies

LLMs have no trouble generating Rhai - it's syntactically similar to Rust/JavaScript:

// Variables
let x = 42;
let name = "Claude";

// String interpolation (backticks)
let greeting = `Hello, ${name}!`;

// Arrays and loops
let items = [1, 2, 3, 4, 5];
let sum = 0;
for item in items {
    sum += item;
}

// Conditionals
if sum > 10 {
    "Large sum"
} else {
    "Small sum"
}

// Maps (objects)
let config = #{
    debug: true,
    limit: 100
};

// Tool calls (registered functions)
let content = read_file("README.md");
let files = list_directory("src");

Feature Flags

Feature Default Description
native Yes Thread-safe with Arc<Mutex> (for native Rust)
wasm No Single-threaded with Rc<RefCell> (for browser/Node.js)

Testing

Native Tests

# Run all native tests
cargo test

# Run with verbose output
cargo test -- --nocapture

WASM Tests

WASM tests require wasm-pack. Install it with:

cargo install wasm-pack

Run WASM tests:

# Test with Node.js (fastest)
wasm-pack test --node --features wasm --no-default-features

# Test with headless Chrome
wasm-pack test --headless --chrome --features wasm --no-default-features

# Test with headless Firefox
wasm-pack test --headless --firefox --features wasm --no-default-features

Test Coverage

The test suite includes:

Native tests (39 tests)

  • Orchestrator creation and configuration
  • Tool registration and execution
  • Script compilation and execution
  • Error handling (compilation errors, tool errors, runtime errors)
  • Execution limits (max operations, max tool calls, timeout)
  • JSON type conversion
  • Loop and conditional execution
  • Timing and metrics recording

WASM tests (25 tests)

  • ExecutionLimits constructors and setters
  • WasmOrchestrator creation
  • Script execution (simple, loops, conditionals, functions)
  • Tool registration and execution
  • JavaScript callback integration
  • Error handling (compilation, runtime, tool errors)
  • Max operations and tool call limits
  • Complex data structures (arrays, maps, nested)

Integration Example

The orchestrator integrates with AI agents via a tool definition:

// Register as "execute_script" tool for the LLM
Tool {
    name: "execute_script",
    description: "Execute a Rhai script for programmatic tool orchestration.
                  Write code that calls registered tools, processes results,
                  and returns only the final output. Use loops for batch
                  operations, conditionals for branching logic.",
    input_schema: /* script parameter */,
    requires_approval: false,  // Scripts are sandboxed
}

When the LLM needs to perform multi-step operations, it writes a Rhai script instead of making sequential individual tool calls. The script executes locally, and only the final result enters the context window.

Instructing LLMs to Generate Rhai

To use programmatic tool calling, your LLM needs to know how to write Rhai scripts. Include something like this in your system prompt:

System Prompt Template

You have access to a script execution tool that runs Rhai code. When you need to:
- Call multiple tools in sequence
- Process data from tool results
- Loop over items or aggregate results
- Apply conditional logic based on tool outputs

Write a Rhai script instead of making individual tool calls.

## Rhai Syntax Quick Reference

Variables and types:
  let x = 42;                    // integer
  let name = "hello";            // string
  let items = [1, 2, 3];         // array
  let config = #{ key: "value" }; // map (object)

String interpolation (use backticks):
  let msg = `Hello, ${name}!`;
  let result = `Found ${items.len()} items`;

Loops:
  for item in items { /* body */ }
  for i in 0..10 { /* 0 to 9 */ }

Conditionals:
  if x > 5 { "big" } else { "small" }

String methods:
  s.len(), s.contains("x"), s.starts_with("x"), s.ends_with("x")
  s.split(","), s.trim(), s.to_upper(), s.to_lower()
  s.sub_string(start, len), s.index_of("x")

Array methods:
  arr.push(item), arr.len(), arr.pop()
  arr.filter(|x| x > 5), arr.map(|x| x * 2)

Parsing:
  "42".parse_int(), "3.14".parse_float()

Available tools (call as functions):
  {TOOL_LIST}

## Important Rules

1. The LAST expression in your script is the return value
2. Use string interpolation with backticks for output: `Result: ${value}`
3. Process data locally - don't return intermediate results
4. Only return the final summary/answer

## Example

Task: Get total expenses for employees 1-3

Script:
let total = 0;
for id in [1, 2, 3] {
    let expenses = get_expenses(id);  // Returns JSON array
    // Parse and sum (simplified)
    total += expenses.len() * 100;    // Estimate
}
`Total across 3 employees: $${total}`

Rhai Syntax Cheatsheet for LLMs

Concept Rhai Syntax Notes
Variables let x = 5; Immutable by default
Mutable let x = 5; x = 10; Can reassign
Strings "hello" or `hello` Backticks allow interpolation
Interpolation `Value: ${x}` Only in backtick strings
Arrays [1, 2, 3] Dynamic, mixed types OK
Maps #{ a: 1, b: 2 } Like JSON objects
For loops for x in arr { } Iterates over arrays
Ranges for i in 0..5 { } 0, 1, 2, 3, 4
If/else if x > 5 { a } else { b } Expression-based
Functions fn add(a, b) { a + b } Last expr is return
Tool calls tool_name(arg) Registered tools are functions
Comments // comment Single line
Unit (null) () Like None/null

Related Projects

Acknowledgements

This project implements patterns from:

License

MIT