Embacle — LLM Runners
Standalone Rust library that wraps AI CLI tools and SDKs as pluggable LLM providers.
Instead of integrating with LLM APIs directly (which require API keys, SDKs, and managing auth), Embacle delegates to CLI tools that users already have installed and authenticated — getting model upgrades, auth management, and protocol handling for free. For GitHub Copilot, an optional headless mode communicates via the ACP (Agent Client Protocol) for SDK-managed tool calling.
Tested With
Embacle has been tested with mirroir.dev, an MCP server for AI-powered iPhone automation.
Supported Runners
CLI Runners (subprocess-based)
| Runner | Binary | Features |
|---|---|---|
| Claude Code | claude |
JSON output, streaming, system prompts, session resume |
| GitHub Copilot | copilot |
Text parsing, streaming |
| Cursor Agent | cursor-agent |
JSON output, streaming, MCP approval |
| OpenCode | opencode |
JSON events, session management |
| Gemini CLI | gemini |
JSON/stream-JSON output, streaming, session resume |
| Codex CLI | codex |
JSONL output, streaming, sandboxed exec mode |
| Goose CLI | goose |
JSON/stream-JSON output, streaming, no-session mode |
| Cline CLI | cline |
NDJSON output, streaming, session resume via task IDs |
| Continue CLI | cn |
JSON output, single-shot completions |
| Warp | oz |
NDJSON output, conversation resume |
ACP Runners (persistent connection)
| Runner | Feature Flag | Features |
|---|---|---|
| GitHub Copilot Headless | copilot-headless |
NDJSON/JSON-RPC via copilot --acp, SDK-managed tool calling, streaming |
Quick Start
Add to your Cargo.toml:
[]
= "0.8"
Use a CLI runner:
use PathBuf;
use ;
use ;
async
Copilot Headless (feature flag)
Enable the copilot-headless feature for ACP-based communication with SDK-managed tool calling:
[]
= { = "0.8", = ["copilot-headless"] }
use ;
use ;
async
The headless runner spawns copilot --acp per request and communicates via NDJSON-framed JSON-RPC. Configuration via environment variables:
| Variable | Default | Description |
|---|---|---|
COPILOT_CLI_PATH |
auto-detect | Override path to copilot binary |
COPILOT_HEADLESS_MODEL |
claude-opus-4.6-fast |
Default model for completions |
COPILOT_GITHUB_TOKEN |
stored OAuth | GitHub auth token (falls back to GH_TOKEN, GITHUB_TOKEN) |
MCP Server (embacle-mcp)
A standalone MCP server binary that exposes embacle runners via the Model Context Protocol. Connect any MCP-compatible client (Claude Desktop, editors, custom agents) to use all embacle providers.
Usage
# Stdio transport (default — for editor/client integration)
# HTTP transport (for network-accessible deployments)
MCP Tools
| Tool | Description |
|---|---|
get_provider |
Get active LLM provider and list available providers |
set_provider |
Switch the active provider (claude_code, copilot, cursor_agent, opencode, gemini_cli, codex_cli, goose_cli, cline_cli, continue_cli, warp_cli) |
get_model |
Get current model and list available models for the active provider |
set_model |
Set the model for subsequent requests (pass null to reset to default) |
get_multiplex_provider |
Get providers configured for multiplex dispatch |
set_multiplex_provider |
Configure providers for fan-out mode |
prompt |
Send chat messages to the active provider, or multiplex to all configured providers |
Client Configuration
Add to your MCP client config (e.g. Claude Desktop claude_desktop_config.json):
REST API Server (embacle-server)
An OpenAI-compatible HTTP server that proxies requests to embacle runners. Any client that speaks the OpenAI chat completions API can use it without modification.
Usage
# Start with default provider (copilot) on localhost:3000
# Specify provider and port
Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat/completions |
Chat completion (streaming and non-streaming) |
GET |
/v1/models |
List available providers and models |
GET |
/health |
Per-provider readiness check |
Model Routing
The model field determines which provider handles the request. Use a provider:model prefix to target a specific runner, or pass a bare model name to use the server's default provider.
# Explicit provider
# Default provider
Multiplex
Pass an array of models to fan out the same prompt to multiple providers concurrently. Each provider runs in its own task; failures in one don't affect others.
The response uses object: "chat.completion.multiplex" with per-provider results and timing.
Streaming is not supported for multiplex requests.
SSE Streaming
Set "stream": true for Server-Sent Events output in OpenAI streaming format (data: {json}\n\n with data: [DONE] terminator).
Authentication
Optional. Set EMBACLE_API_KEY to require bearer token auth on all endpoints. When unset, all requests are allowed through (localhost development mode). The env var is read per-request, so key rotation doesn't require a restart.
EMBACLE_API_KEY=my-secret
Architecture
Your Application
└── embacle (this library)
│
├── CLI Runners (subprocess per request)
│ ├── ClaudeCodeRunner → spawns `claude -p "prompt" --output-format json`
│ ├── CopilotRunner → spawns `copilot -p "prompt"`
│ ├── CursorAgentRunner → spawns `cursor-agent -p "prompt" --output-format json`
│ ├── OpenCodeRunner → spawns `opencode run "prompt" --format json`
│ ├── GeminiCliRunner → spawns `gemini -p "prompt" -o json -y`
│ ├── CodexCliRunner → spawns `codex exec "prompt" --json --full-auto`
│ ├── GooseCliRunner → spawns `goose run --quiet --no-session`
│ ├── ClineCliRunner → spawns `cline task --json --act --yolo`
│ ├── ContinueCliRunner → spawns `cn -p --format json`
│ └── WarpCliRunner → spawns `oz agent run --prompt "..." --output-format json`
│
├── ACP Runners (persistent connection, behind feature flag)
│ └── CopilotHeadlessRunner → NDJSON/JSON-RPC to `copilot --acp`
│
├── Provider Decorators (composable wrappers)
│ ├── FallbackProvider → ordered chain, first success wins
│ ├── MetricsProvider → latency, token, and error tracking
│ └── QualityGateProvider → response validation with retry
│
├── Agent Loop
│ └── AgentExecutor → multi-turn tool calling with configurable max turns
│
├── Structured Output
│ └── request_structured_output() → schema-validated JSON extraction with retry
│
├── MCP Tool Bridge
│ └── McpToolBridge → MCP tool definitions ↔ text-based tool loop
│
├── MCP Server (separate binary crate)
│ └── embacle-mcp → JSON-RPC 2.0 over stdio or HTTP/SSE
│
├── REST API Server (separate binary crate)
│ └── embacle-server → OpenAI-compatible HTTP, SSE streaming, multiplex
│
└── Tool Simulation (text-based tool calling for CLI runners)
└── execute_with_text_tools() → catalog injection, XML parsing, tool loop
All runners implement the same LlmProvider trait:
complete()— single-shot completioncomplete_stream()— streaming completionhealth_check()— verify the runner is available and authenticated
The CopilotHeadlessRunner (requires copilot-headless feature) additionally provides:
converse()— returnsHeadlessToolResponsewith observed tool calls that copilot executed internally
Text-Based Tool Calling (CLI runners)
CLI runners don't have native tool calling, so Embacle provides a text-based simulation layer. It injects a tool catalog into the system prompt and parses <tool_call> XML blocks from the LLM response, looping until the model stops calling tools.
use ;
use ;
use Arc;
use json;
let declarations = vec!;
let handler = new;
let mut messages = vec!;
let result = execute_with_text_tools.await?;
println!; // final response with tools stripped
println!;
The pure functions are also available individually for custom loop implementations:
| Function | Purpose |
|---|---|
generate_tool_catalog() |
Converts declarations into a markdown catalog |
inject_tool_catalog() |
Appends catalog to the system message |
parse_tool_call_blocks() |
Parses <tool_call> XML blocks from response text |
strip_tool_call_blocks() |
Returns clean text with tool blocks removed |
format_tool_results_as_text() |
Formats results as <tool_result> XML blocks |
Native Tool Calling Types
The core library provides typed tool calling that flows through ChatRequest and ChatResponse, so callers can use native tool definitions without relying on the text-based XML simulation.
use ;
use ;
use json;
let tools = vec!;
let request = new
.with_tools
.with_tool_choice;
// Providers that support function calling will use native tool calls;
// the server falls back to XML text simulation for CLI runners
Additional request fields: top_p, stop sequences, and response_format (text, JSON object, or JSON schema). The capability_guard module validates these against provider capabilities before dispatch.
Agent Loop
AgentExecutor runs a multi-turn tool-calling loop: inject a tool catalog, send the prompt, parse <tool_call> blocks from the response, execute tools, feed results back, repeat until the model stops calling tools or max_turns is reached.
use ;
use FunctionDeclaration;
use ;
use Arc;
use json;
let declarations = vec!;
let handler = new;
let agent = new
.with_max_turns;
let request = new;
let result = agent.execute.await?;
println!;
Fallback Chains
FallbackProvider wraps multiple providers and tries them in order. The first successful response wins; if all fail, the last error is returned.
use FallbackProvider;
let provider = new?;
// Uses primary_runner; falls back to backup_runner on error
let response = provider.complete.await?;
Health checks pass if any provider is healthy. Capabilities are the union of all inner providers.
Metrics
MetricsProvider wraps any provider to track latency, token usage, call counts, and errors.
use MetricsProvider;
let provider = new;
let response = provider.complete.await?;
let report = provider.report;
println!;
Quality Gate
QualityGateProvider validates responses against a policy (minimum length, refusal detection) and retries with feedback if validation fails.
use ;
let policy = QualityPolicy ;
let provider = new;
let response = provider.complete.await?;
Structured Output
Forces any provider to return schema-valid JSON by injecting schema instructions and validating the response, retrying with error feedback on schema violations. Validation covers nested objects, array items, enum values, numeric bounds (minimum/maximum), and additionalProperties: false.
use ;
use json;
let schema = json!;
let result = request_structured_output.await?;
let data: Value = from_str?;
MCP Tool Bridge
Bridges MCP tool definitions to embacle's text-based tool loop, so CLI runners can use tools from any MCP-compatible tool server.
use ;
let mcp_tools = vec!;
let declarations = to_declarations;
Features
- Zero API keys — uses CLI tools' own auth (OAuth, API keys managed by the tool)
- Auto-discovery — finds installed CLI binaries via
which - Auth readiness — non-blocking checks with env var probes and graceful degradation
- Capability detection — probes CLI version and supported features
- Capability guard — validates request fields against provider capabilities before dispatch
- Native tool calling types —
ToolDefinition,ToolCallRequest,ToolChoice,ResponseFormatin core - Container isolation — optional container-based execution for production
- Subprocess safety — timeout, output limits, environment sandboxing
- Agent loop — multi-turn tool calling with configurable max turns and turn callbacks
- Fallback chains — ordered provider failover with automatic retry
- Metrics — latency, token, and error tracking as a provider decorator
- Quality gate — response validation with retry on refusal or insufficient content
- Structured output — schema-validated JSON extraction with recursive validation (nested objects, arrays, enums, numeric bounds)
- MCP tool bridge — connect MCP tool servers to CLI runners via text-based tool loop
- Feature flags — SDK integrations are opt-in to keep the default dependency footprint minimal
Modules
| Module | Feature | Purpose |
|---|---|---|
types |
default | Core types: LlmProvider trait, ChatRequest, ChatResponse, RunnerError, ToolDefinition, ToolCallRequest, ToolChoice, ResponseFormat |
config |
default | Runner types, execution modes, configuration |
discovery |
default | Auto-detect installed CLI binaries |
auth |
default | Readiness checking with env var probes (ProviderReadiness, check_env_var_auth) |
capability_guard |
default | Validates request fields against provider capabilities (tools, top_p, stop, response format) |
compat |
default | Version compatibility and capability detection |
process |
default | Subprocess spawning with timeout and output limits |
sandbox |
default | Environment variable whitelisting, working directory control |
container |
default | Container-based execution backend |
prompt |
default | Prompt building from chat messages |
tool_simulation |
default | Text-based tool calling for CLI runners (<tool_call> XML protocol) |
agent |
default | Multi-turn agent loop with tool calling and turn callbacks |
fallback |
default | Ordered provider chain with first-success-wins failover |
metrics |
default | Latency, token usage, and error tracking decorator |
quality_gate |
default | Response validation (refusal detection, length checks) with retry |
structured_output |
default | Schema-validated JSON extraction with recursive validation and retry |
mcp_tool_bridge |
default | MCP tool definitions ↔ text-based tool loop bridge |
copilot_headless |
copilot-headless |
Copilot ACP runner (NDJSON/JSON-RPC via copilot --acp) |
copilot_headless_config |
copilot-headless |
Copilot Headless configuration from environment (PermissionPolicy) |
License
Licensed under the Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0).