Embacle — LLM Runners
Standalone Rust library that wraps AI CLI tools and SDKs as pluggable LLM providers.
Instead of integrating with LLM APIs directly (which require API keys, SDKs, and managing auth), Embacle delegates to CLI tools that users already have installed and authenticated — getting model upgrades, auth management, and protocol handling for free. For GitHub Copilot, an optional SDK mode maintains a persistent JSON-RPC connection for native tool calling.
Supported Runners
CLI Runners (subprocess-based)
| Runner | Binary | Features |
|---|---|---|
| Claude Code | claude |
JSON output, streaming, system prompts, session resume |
| GitHub Copilot | copilot |
Text parsing, streaming |
| Cursor Agent | cursor-agent |
JSON output, streaming, MCP approval |
| OpenCode | opencode |
JSON events, session management |
| Gemini CLI | gemini |
JSON/stream-JSON output, streaming, session resume |
| Codex CLI | codex |
JSONL output, streaming, sandboxed exec mode |
| Goose CLI | goose |
JSON/stream-JSON output, streaming, no-session mode |
| Cline CLI | cline |
NDJSON output, streaming, session resume via task IDs |
| Continue CLI | cn |
JSON output, single-shot completions |
SDK Runners (persistent connection)
| Runner | Feature Flag | Features |
|---|---|---|
| GitHub Copilot SDK | copilot-sdk |
Persistent JSON-RPC via copilot --headless, native tool calling, streaming |
Quick Start
Add to your Cargo.toml:
[]
= "0.3"
Use a CLI runner:
use PathBuf;
use ;
use ;
async
Copilot SDK (feature flag)
Enable the copilot-sdk feature for persistent JSON-RPC instead of per-request subprocesses:
[]
= { = "0.3", = ["copilot-sdk"] }
use ;
use ;
async
The SDK runner starts copilot --headless once and reuses the connection across requests. Configuration via environment variables:
| Variable | Default | Description |
|---|---|---|
COPILOT_CLI_PATH |
auto-detect | Override path to copilot binary |
COPILOT_SDK_MODEL |
claude-sonnet-4.6 |
Default model for completions |
COPILOT_SDK_TRANSPORT |
stdio |
Transport mode: stdio or tcp |
COPILOT_GITHUB_TOKEN |
stored OAuth | GitHub auth token (falls back to GH_TOKEN, GITHUB_TOKEN) |
MCP Server (embacle-mcp)
A standalone MCP server binary that exposes embacle runners via the Model Context Protocol. Connect any MCP-compatible client (Claude Desktop, editors, custom agents) to use all embacle providers.
Usage
# Stdio transport (default — for editor/client integration)
# HTTP transport (for network-accessible deployments)
MCP Tools
| Tool | Description |
|---|---|
get_provider |
Get active LLM provider and list available providers |
set_provider |
Switch the active provider (claude_code, copilot, cursor_agent, opencode, gemini_cli, codex_cli, goose_cli, cline_cli, continue_cli) |
get_model |
Get current model and list available models for the active provider |
set_model |
Set the model for subsequent requests (pass null to reset to default) |
get_multiplex_provider |
Get providers configured for multiplex dispatch |
set_multiplex_provider |
Configure providers for fan-out mode |
prompt |
Send chat messages to the active provider, or multiplex to all configured providers |
Client Configuration
Add to your MCP client config (e.g. Claude Desktop claude_desktop_config.json):
REST API Server (embacle-server)
An OpenAI-compatible HTTP server that proxies requests to embacle runners. Any client that speaks the OpenAI chat completions API can use it without modification.
Usage
# Start with default provider (copilot) on localhost:3000
# Specify provider and port
Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat/completions |
Chat completion (streaming and non-streaming) |
GET |
/v1/models |
List available providers and models |
GET |
/health |
Per-provider readiness check |
Model Routing
The model field determines which provider handles the request. Use a provider:model prefix to target a specific runner, or pass a bare model name to use the server's default provider.
# Explicit provider
# Default provider
Multiplex
Pass an array of models to fan out the same prompt to multiple providers concurrently. Each provider runs in its own task; failures in one don't affect others.
The response uses object: "chat.completion.multiplex" with per-provider results and timing.
Streaming is not supported for multiplex requests.
SSE Streaming
Set "stream": true for Server-Sent Events output in OpenAI streaming format (data: {json}\n\n with data: [DONE] terminator).
Authentication
Optional. Set EMBACLE_API_KEY to require bearer token auth on all endpoints. When unset, all requests are allowed through (localhost development mode). The env var is read per-request, so key rotation doesn't require a restart.
EMBACLE_API_KEY=my-secret
Architecture
Your Application
└── embacle (this library)
│
├── CLI Runners (subprocess per request)
│ ├── ClaudeCodeRunner → spawns `claude -p "prompt" --output-format json`
│ ├── CopilotRunner → spawns `copilot -p "prompt"`
│ ├── CursorAgentRunner → spawns `cursor-agent -p "prompt" --output-format json`
│ ├── OpenCodeRunner → spawns `opencode run "prompt" --format json`
│ ├── GeminiCliRunner → spawns `gemini -p "prompt" -o json -y`
│ ├── CodexCliRunner → spawns `codex exec "prompt" --json --full-auto`
│ ├── GooseCliRunner → spawns `goose run --quiet --no-session`
│ ├── ClineCliRunner → spawns `cline task --json --act --yolo`
│ └── ContinueCliRunner → spawns `cn -p --format json`
│
├── SDK Runners (persistent connection, behind feature flag)
│ └── CopilotSdkRunner → JSON-RPC to `copilot --headless`
│
├── Provider Decorators (composable wrappers)
│ ├── FallbackProvider → ordered chain, first success wins
│ ├── MetricsProvider → latency, token, and error tracking
│ └── QualityGateProvider → response validation with retry
│
├── Agent Loop
│ └── AgentExecutor → multi-turn tool calling with configurable max turns
│
├── Structured Output
│ └── request_structured_output() → schema-validated JSON extraction with retry
│
├── MCP Tool Bridge
│ └── McpToolBridge → MCP tool definitions ↔ text-based tool loop
│
├── MCP Server (separate binary crate)
│ └── embacle-mcp → JSON-RPC 2.0 over stdio or HTTP/SSE
│
├── REST API Server (separate binary crate)
│ └── embacle-server → OpenAI-compatible HTTP, SSE streaming, multiplex
│
└── Tool Simulation (text-based tool calling for CLI runners)
└── execute_with_text_tools() → catalog injection, XML parsing, tool loop
All runners implement the same LlmProvider trait:
complete()— single-shot completioncomplete_stream()— streaming completionhealth_check()— verify the runner is available and authenticated
The CopilotSdkRunner additionally provides:
execute_with_tools()— native tool calling via the SDK's session and tool handler infrastructure
Text-Based Tool Calling (CLI runners)
CLI runners don't have native tool calling, so Embacle provides a text-based simulation layer. It injects a tool catalog into the system prompt and parses <tool_call> XML blocks from the LLM response, looping until the model stops calling tools.
use ;
use ;
use Arc;
use json;
let declarations = vec!;
let handler = new;
let mut messages = vec!;
let result = execute_with_text_tools.await?;
println!; // final response with tools stripped
println!;
The pure functions are also available individually for custom loop implementations:
| Function | Purpose |
|---|---|
generate_tool_catalog() |
Converts declarations into a markdown catalog |
inject_tool_catalog() |
Appends catalog to the system message |
parse_tool_call_blocks() |
Parses <tool_call> XML blocks from response text |
strip_tool_call_blocks() |
Returns clean text with tool blocks removed |
format_tool_results_as_text() |
Formats results as <tool_result> XML blocks |
Agent Loop
AgentExecutor runs a multi-turn tool-calling loop: inject a tool catalog, send the prompt, parse <tool_call> blocks from the response, execute tools, feed results back, repeat until the model stops calling tools or max_turns is reached.
use ;
use FunctionDeclaration;
use ;
use Arc;
use json;
let declarations = vec!;
let handler = new;
let agent = new
.with_max_turns;
let request = new;
let result = agent.execute.await?;
println!;
Fallback Chains
FallbackProvider wraps multiple providers and tries them in order. The first successful response wins; if all fail, the last error is returned.
use FallbackProvider;
let provider = new?;
// Uses primary_runner; falls back to backup_runner on error
let response = provider.complete.await?;
Health checks pass if any provider is healthy. Capabilities are the union of all inner providers.
Metrics
MetricsProvider wraps any provider to track latency, token usage, call counts, and errors.
use MetricsProvider;
let provider = new;
let response = provider.complete.await?;
let report = provider.report;
println!;
Quality Gate
QualityGateProvider validates responses against a policy (minimum length, refusal detection) and retries with feedback if validation fails.
use ;
let policy = QualityPolicy ;
let provider = with_policy;
let response = provider.complete.await?;
Structured Output
Forces any provider to return schema-valid JSON by injecting schema instructions and validating the response, retrying with error feedback on schema violations.
use ;
use json;
let schema = json!;
let result = request_structured_output.await?;
let data: Value = from_str?;
MCP Tool Bridge
Bridges MCP tool definitions to embacle's text-based tool loop, so CLI runners can use tools from any MCP-compatible tool server.
use ;
let mcp_tools = vec!;
let declarations = to_declarations;
Features
- Zero API keys — uses CLI tools' own auth (OAuth, API keys managed by the tool)
- Auto-discovery — finds installed CLI binaries via
which - Auth readiness — non-blocking checks, graceful degradation
- Capability detection — probes CLI version and supported features
- Container isolation — optional container-based execution for production
- Subprocess safety — timeout, output limits, environment sandboxing
- Agent loop — multi-turn tool calling with configurable max turns and turn callbacks
- Fallback chains — ordered provider failover with automatic retry
- Metrics — latency, token, and error tracking as a provider decorator
- Quality gate — response validation with retry on refusal or insufficient content
- Structured output — schema-validated JSON extraction from any provider
- MCP tool bridge — connect MCP tool servers to CLI runners via text-based tool loop
- Feature flags — SDK integrations are opt-in to keep the default dependency footprint minimal
Modules
| Module | Feature | Purpose |
|---|---|---|
types |
default | Core types: LlmProvider trait, ChatRequest, ChatResponse, RunnerError |
config |
default | Runner types, execution modes, configuration |
discovery |
default | Auto-detect installed CLI binaries |
auth |
default | Readiness checking (is the CLI authenticated?) |
compat |
default | Version compatibility and capability detection |
process |
default | Subprocess spawning with timeout and output limits |
sandbox |
default | Environment variable whitelisting, working directory control |
container |
default | Container-based execution backend |
prompt |
default | Prompt building from chat messages |
tool_simulation |
default | Text-based tool calling for CLI runners (<tool_call> XML protocol) |
agent |
default | Multi-turn agent loop with tool calling and turn callbacks |
fallback |
default | Ordered provider chain with first-success-wins failover |
metrics |
default | Latency, token usage, and error tracking decorator |
quality_gate |
default | Response validation (refusal detection, length checks) with retry |
structured_output |
default | Schema-validated JSON extraction with retry |
mcp_tool_bridge |
default | MCP tool definitions ↔ text-based tool loop bridge |
copilot_sdk_runner |
copilot-sdk |
Copilot SDK runner (persistent JSON-RPC) |
copilot_sdk_config |
copilot-sdk |
Copilot SDK configuration from environment |
tool_bridge |
copilot-sdk |
Tool definition conversion for native tool calling |
License
Licensed under the Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0).