# Embacle — LLM Runners
[](https://crates.io/crates/embacle)
[](https://docs.rs/embacle)
[](https://github.com/dravr-ai/dravr-embacle/actions/workflows/ci.yml)
[](LICENSE.md)
Standalone Rust library that wraps AI CLI tools and SDKs as pluggable LLM providers.
Instead of integrating with LLM APIs directly (which require API keys, SDKs, and managing auth), **Embacle** delegates to CLI tools that users already have installed and authenticated — getting model upgrades, auth management, and protocol handling for free. For GitHub Copilot, an optional headless mode communicates via the ACP (Agent Client Protocol) for SDK-managed tool calling.
## Supported Runners
### CLI Runners (subprocess-based)
| Claude Code | `claude` | JSON output, streaming, system prompts, session resume |
| GitHub Copilot | `copilot` | Text parsing, streaming |
| Cursor Agent | `cursor-agent` | JSON output, streaming, MCP approval |
| OpenCode | `opencode` | JSON events, session management |
| Gemini CLI | `gemini` | JSON/stream-JSON output, streaming, session resume |
| Codex CLI | `codex` | JSONL output, streaming, sandboxed exec mode |
| Goose CLI | `goose` | JSON/stream-JSON output, streaming, no-session mode |
| Cline CLI | `cline` | NDJSON output, streaming, session resume via task IDs |
| Continue CLI | `cn` | JSON output, single-shot completions |
| Warp | `oz` | NDJSON output, conversation resume |
### HTTP API Runners (feature-flagged)
| OpenAI API | `openai-api` | Any OpenAI-compatible endpoint (OpenAI, Groq, Gemini, Ollama, vLLM), streaming, tool calling, model discovery |
### ACP Runners (persistent connection)
| GitHub Copilot Headless | `copilot-headless` | NDJSON/JSON-RPC via `copilot --acp`, SDK-managed tool calling, streaming |
## Quick Start
Add to your `Cargo.toml`:
```toml
[dependencies]
embacle = "0.9"
```
Use a CLI runner:
```rust
use std::path::PathBuf;
use embacle::{ClaudeCodeRunner, RunnerConfig};
use embacle::types::{ChatMessage, ChatRequest, LlmProvider};
#[tokio::main]
async fn main() -> Result<(), embacle::types::RunnerError> {
let config = RunnerConfig::new(PathBuf::from("claude"));
let runner = ClaudeCodeRunner::new(config);
let request = ChatRequest::new(vec![
ChatMessage::user("What is the capital of France?"),
]);
let response = runner.complete(&request).await?;
println!("{}", response.content);
Ok(())
}
```
### OpenAI API (feature flag)
Enable the `openai-api` feature for HTTP-based communication with any OpenAI-compatible endpoint:
```toml
[dependencies]
embacle = { version = "0.9", features = ["openai-api"] }
```
```rust
use embacle::{OpenAiApiConfig, OpenAiApiRunner};
use embacle::types::{ChatMessage, ChatRequest, LlmProvider};
#[tokio::main]
async fn main() -> Result<(), embacle::types::RunnerError> {
// Reads OPENAI_API_BASE_URL, OPENAI_API_KEY, OPENAI_API_MODEL from env
let config = OpenAiApiConfig::from_env();
let runner = OpenAiApiRunner::new(config).await;
let request = ChatRequest::new(vec![
ChatMessage::user("What is the capital of France?"),
]);
let response = runner.complete(&request).await?;
println!("{}", response.content);
Ok(())
}
```
Works with any OpenAI-compatible endpoint — OpenAI, Groq, Google Gemini, Ollama, vLLM, and more. To inject a shared HTTP client (e.g. from a connection pool), use `OpenAiApiRunner::with_client(config, client)`.
| `OPENAI_API_BASE_URL` | `https://api.openai.com/v1` | API base URL |
| `OPENAI_API_KEY` | *(none)* | Bearer token for authentication |
| `OPENAI_API_MODEL` | `gpt-5.4` | Default model for completions |
| `OPENAI_API_TIMEOUT_SECS` | `300` | HTTP request timeout |
### Copilot Headless (feature flag)
Enable the `copilot-headless` feature for ACP-based communication with SDK-managed tool calling:
```toml
[dependencies]
embacle = { version = "0.9", features = ["copilot-headless"] }
```
```rust
use embacle::{CopilotHeadlessRunner, CopilotHeadlessConfig};
use embacle::types::{ChatMessage, ChatRequest, LlmProvider};
#[tokio::main]
async fn main() -> Result<(), embacle::types::RunnerError> {
// Reads COPILOT_HEADLESS_MODEL, COPILOT_GITHUB_TOKEN, etc. from env
let runner = CopilotHeadlessRunner::from_env().await;
let request = ChatRequest::new(vec![
ChatMessage::user("Explain Rust ownership"),
]);
let response = runner.complete(&request).await?;
println!("{}", response.content);
Ok(())
}
```
The headless runner spawns `copilot --acp` per request and communicates via NDJSON-framed JSON-RPC. Configuration via environment variables:
| `COPILOT_CLI_PATH` | auto-detect | Override path to copilot binary |
| `COPILOT_HEADLESS_MODEL` | `claude-opus-4.6-fast` | Default model for completions |
| `COPILOT_GITHUB_TOKEN` | stored OAuth | GitHub auth token (falls back to `GH_TOKEN`, `GITHUB_TOKEN`) |
## MCP Server (`embacle-mcp`)
A standalone MCP server binary that exposes embacle runners via the [Model Context Protocol](https://modelcontextprotocol.io/). Connect any MCP-compatible client (Claude Desktop, editors, custom agents) to use all embacle providers.
### Usage
```bash
# Stdio transport (default — for editor/client integration)
embacle-mcp --provider copilot
# HTTP transport (for network-accessible deployments)
embacle-mcp --transport http --host 0.0.0.0 --port 3000 --provider claude_code
```
### MCP Tools
| `get_provider` | Get active LLM provider and list available providers |
| `set_provider` | Switch the active provider (`claude_code`, `copilot`, `cursor_agent`, `opencode`, `gemini_cli`, `codex_cli`, `goose_cli`, `cline_cli`, `continue_cli`, `warp_cli`, `openai_api`) |
| `get_model` | Get current model and list available models for the active provider |
| `set_model` | Set the model for subsequent requests (pass null to reset to default) |
| `get_multiplex_provider` | Get providers configured for multiplex dispatch |
| `set_multiplex_provider` | Configure providers for fan-out mode |
| `prompt` | Send chat messages to the active provider, or multiplex to all configured providers |
### Client Configuration
Add to your MCP client config (e.g. Claude Desktop `claude_desktop_config.json`):
```json
{
"mcpServers": {
"embacle": {
"command": "embacle-mcp",
"args": ["--provider", "copilot"]
}
}
}
```
## REST API Server (`embacle-server`)
An OpenAI-compatible HTTP server that proxies requests to embacle runners. Any client that speaks the OpenAI chat completions API can use it without modification.
### Usage
```bash
# Start with default provider (copilot) on localhost:3000
embacle-server
# Specify provider and port
embacle-server --provider claude_code --port 8080 --host 0.0.0.0
```
### Endpoints
| `POST` | `/v1/chat/completions` | Chat completion (streaming and non-streaming) |
| `GET` | `/v1/models` | List available providers and models |
| `GET` | `/health` | Per-provider readiness check |
### Model Routing
The `model` field determines which provider handles the request. Use a `provider:model` prefix to target a specific runner, or pass a bare model name to use the server's default provider.
```bash
# Explicit provider
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "claude:opus", "messages": [{"role": "user", "content": "hello"}]}'
# Default provider
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-5.4", "messages": [{"role": "user", "content": "hello"}]}'
```
### Multiplex
Pass an array of models to fan out the same prompt to multiple providers concurrently. Each provider runs in its own task; failures in one don't affect others.
```bash
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": ["copilot:gpt-4o", "claude:opus"], "messages": [{"role": "user", "content": "hello"}]}'
```
The response uses `object: "chat.completion.multiplex"` with per-provider results and timing.
Streaming is not supported for multiplex requests.
### SSE Streaming
Set `"stream": true` for Server-Sent Events output in OpenAI streaming format (`data: {json}\n\n` with `data: [DONE]` terminator).
### Authentication
Optional. Set `EMBACLE_API_KEY` to require bearer token auth on all endpoints. When unset, all requests are allowed through (localhost development mode). The env var is read per-request, so key rotation doesn't require a restart.
```bash
EMBACLE_API_KEY=my-secret embacle-server
curl http://localhost:3000/v1/models -H "Authorization: Bearer my-secret"
```
## Docker
Pull the image from GitHub Container Registry:
```bash
docker pull ghcr.io/dravr-ai/embacle:latest
```
The image includes `embacle-server` and `embacle-mcp` with Node.js pre-installed for adding CLI backends.
### Adding a CLI Backend
The base image doesn't include CLI tools. Install them in a derived image:
```dockerfile
FROM ghcr.io/dravr-ai/embacle
USER root
RUN npm install -g @anthropic-ai/claude-code
USER embacle
```
Build and run:
```bash
docker build -t my-embacle .
docker run -p 3000:3000 my-embacle --provider claude_code
```
### Auth and Configuration
CLI tools store auth tokens in their config directories. Mount them from the host, or set provider-specific env vars:
```bash
# Mount Claude Code auth from host
docker run -p 3000:3000 \
-v ~/.claude:/home/embacle/.claude:ro \
my-embacle --provider claude_code
# Or pass env vars if the CLI supports them
docker run -p 3000:3000 \
-e GITHUB_TOKEN=ghp_... \
-e EMBACLE_API_KEY=my-secret \
my-embacle --provider copilot
```
### Running embacle-mcp
Override the entrypoint to run the MCP server instead:
```bash
docker run --entrypoint embacle-mcp ghcr.io/dravr-ai/embacle --provider copilot
```
## Architecture
```
Your Application
└── embacle (this library)
│
├── CLI Runners (subprocess per request)
│ ├── ClaudeCodeRunner → spawns `claude -p "prompt" --output-format json`
│ ├── CopilotRunner → spawns `copilot -p "prompt"`
│ ├── CursorAgentRunner → spawns `cursor-agent -p "prompt" --output-format json`
│ ├── OpenCodeRunner → spawns `opencode run "prompt" --format json`
│ ├── GeminiCliRunner → spawns `gemini -p "prompt" -o json -y`
│ ├── CodexCliRunner → spawns `codex exec "prompt" --json --full-auto`
│ ├── GooseCliRunner → spawns `goose run --quiet --no-session`
│ ├── ClineCliRunner → spawns `cline task --json --act --yolo`
│ ├── ContinueCliRunner → spawns `cn -p --format json`
│ └── WarpCliRunner → spawns `oz agent run --prompt "..." --output-format json`
│
├── HTTP API Runners (behind feature flag)
│ └── OpenAiApiRunner → reqwest to any OpenAI-compatible endpoint
│
├── ACP Runners (persistent connection, behind feature flag)
│ └── CopilotHeadlessRunner → NDJSON/JSON-RPC to `copilot --acp`
│
├── Provider Decorators (composable wrappers)
│ ├── FallbackProvider → ordered chain, first success wins
│ ├── MetricsProvider → latency, token, and error tracking
│ └── QualityGateProvider → response validation with retry
│
├── Agent Loop
│ └── AgentExecutor → multi-turn tool calling with configurable max turns
│
├── Structured Output
│ └── request_structured_output() → schema-validated JSON extraction with retry
│
├── MCP Tool Bridge
│ └── McpToolBridge → MCP tool definitions ↔ text-based tool loop
│
├── MCP Server (separate binary crate)
│ └── embacle-mcp → JSON-RPC 2.0 over stdio or HTTP/SSE
│
├── REST API Server (separate binary crate)
│ └── embacle-server → OpenAI-compatible HTTP, SSE streaming, multiplex
│
└── Tool Simulation (text-based tool calling for CLI runners)
└── execute_with_text_tools() → catalog injection, XML parsing, tool loop
```
All runners implement the same `LlmProvider` trait:
- **`complete()`** — single-shot completion
- **`complete_stream()`** — streaming completion
- **`health_check()`** — verify the runner is available and authenticated
For detailed API docs — fallback chains, structured output, agent loop, metrics, quality gates, tool simulation, and more — see [docs.rs/embacle](https://docs.rs/embacle).
## Tested With
Embacle has been tested with [mirroir.dev]([https://mirroir.dev](https://github.com/jfarcand/mirroir-mcp)), an MCP server for AI-powered iPhone automation.
## License
Licensed under the Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or <http://www.apache.org/licenses/LICENSE-2.0>).