embacle 0.11.0 - Docs.rs

# Embacle — LLM Runners

[![crates.io](https://img.shields.io/crates/v/embacle.svg)](https://crates.io/crates/embacle)
[![docs.rs](https://docs.rs/embacle/badge.svg)](https://docs.rs/embacle)
[![CI](https://github.com/dravr-ai/dravr-embacle/actions/workflows/ci.yml/badge.svg)](https://github.com/dravr-ai/dravr-embacle/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE.md)

Standalone Rust library that wraps 12 AI CLI tools and SDKs as pluggable LLM providers.

Instead of integrating with LLM APIs directly (which require API keys, SDKs, and managing auth), **Embacle** delegates to CLI tools that users already have installed and authenticated — getting model upgrades, auth management, and protocol handling for free. For GitHub Copilot, an optional headless mode communicates via the ACP (Agent Client Protocol) for SDK-managed tool calling.

## Supported Runners

### CLI Runners (subprocess-based)

| Runner | Binary | Features |
|--------|--------|----------|
| Claude Code | `claude` | JSON output, streaming, system prompts, session resume |
| GitHub Copilot | `copilot` | Text parsing, streaming |
| Cursor Agent | `cursor-agent` | JSON output, streaming, MCP approval |
| OpenCode | `opencode` | JSON events, session management |
| Gemini CLI | `gemini` | JSON/stream-JSON output, streaming, session resume |
| Codex CLI | `codex` | JSONL output, streaming, sandboxed exec mode |
| Goose CLI | `goose` | JSON/stream-JSON output, streaming, no-session mode |
| Cline CLI | `cline` | NDJSON output, streaming, session resume via task IDs |
| Continue CLI | `cn` | JSON output, single-shot completions |
| Warp | `oz` | NDJSON output, conversation resume |
| Kiro CLI | `kiro-cli` | ANSI-stripped text output, auto model selection |
| Kilo Code | `kilo` | NDJSON output, streaming, token tracking, 500+ models via Kilo Gateway |

### HTTP API Runners (feature-flagged)

| Runner | Feature Flag | Features |
|--------|-------------|----------|
| OpenAI API | `openai-api` | Any OpenAI-compatible endpoint (OpenAI, Groq, Gemini, Ollama, vLLM), streaming, tool calling, model discovery |

### ACP Runners (persistent connection)

| Runner | Feature Flag | Features |
|--------|-------------|----------|
| GitHub Copilot Headless | `copilot-headless` | NDJSON/JSON-RPC via `copilot --acp`, SDK-managed tool calling, streaming |

## Install

### Homebrew (macOS / Linux)

```bash
brew tap dravr-ai/tap
brew install embacle
```

This installs both `embacle-server` (OpenAI API + MCP) and `embacle-mcp` (standalone MCP).

### Docker

```bash
docker pull ghcr.io/dravr-ai/embacle:latest
```

### Cargo (library)

```toml
[dependencies]
embacle = "0.10"
```

## Quick Start

Use a CLI runner:

```rust
use std::path::PathBuf;
use embacle::{ClaudeCodeRunner, RunnerConfig};
use embacle::types::{ChatMessage, ChatRequest, LlmProvider};

#[tokio::main]
async fn main() -> Result<(), embacle::types::RunnerError> {
    let config = RunnerConfig::new(PathBuf::from("claude"));
    let runner = ClaudeCodeRunner::new(config);

    let request = ChatRequest::new(vec![
        ChatMessage::user("What is the capital of France?"),
    ]);

    let response = runner.complete(&request).await?;
    println!("{}", response.content);
    Ok(())
}
```

### OpenAI API (feature flag)

Enable the `openai-api` feature for HTTP-based communication with any OpenAI-compatible endpoint:

```toml
[dependencies]
embacle = { version = "0.10", features = ["openai-api"] }
```

```rust
use embacle::{OpenAiApiConfig, OpenAiApiRunner};
use embacle::types::{ChatMessage, ChatRequest, LlmProvider};

#[tokio::main]
async fn main() -> Result<(), embacle::types::RunnerError> {
    // Reads OPENAI_API_BASE_URL, OPENAI_API_KEY, OPENAI_API_MODEL from env
    let config = OpenAiApiConfig::from_env();
    let runner = OpenAiApiRunner::new(config).await;

    let request = ChatRequest::new(vec![
        ChatMessage::user("What is the capital of France?"),
    ]);

    let response = runner.complete(&request).await?;
    println!("{}", response.content);
    Ok(())
}
```

Works with any OpenAI-compatible endpoint — OpenAI, Groq, Google Gemini, Ollama, vLLM, and more. To inject a shared HTTP client (e.g. from a connection pool), use `OpenAiApiRunner::with_client(config, client)`.

| Variable | Default | Description |
|----------|---------|-------------|
| `OPENAI_API_BASE_URL` | `https://api.openai.com/v1` | API base URL |
| `OPENAI_API_KEY` | *(none)* | Bearer token for authentication |
| `OPENAI_API_MODEL` | `gpt-5.4` | Default model for completions |
| `OPENAI_API_TIMEOUT_SECS` | `300` | HTTP request timeout |

### Copilot Headless (feature flag)

Enable the `copilot-headless` feature for ACP-based communication with SDK-managed tool calling:

```toml
[dependencies]
embacle = { version = "0.10", features = ["copilot-headless"] }
```

```rust
use embacle::{CopilotHeadlessRunner, CopilotHeadlessConfig};
use embacle::types::{ChatMessage, ChatRequest, LlmProvider};

#[tokio::main]
async fn main() -> Result<(), embacle::types::RunnerError> {
    // Reads COPILOT_HEADLESS_MODEL, COPILOT_GITHUB_TOKEN, etc. from env
    let runner = CopilotHeadlessRunner::from_env().await;

    let request = ChatRequest::new(vec![
        ChatMessage::user("Explain Rust ownership"),
    ]);

    let response = runner.complete(&request).await?;
    println!("{}", response.content);
    Ok(())
}
```

The headless runner spawns `copilot --acp` per request and communicates via NDJSON-framed JSON-RPC. Configuration via environment variables:

| Variable | Default | Description |
|----------|---------|-------------|
| `COPILOT_CLI_PATH` | auto-detect | Override path to copilot binary |
| `COPILOT_HEADLESS_MODEL` | `claude-opus-4.6-fast` | Default model for completions |
| `COPILOT_GITHUB_TOKEN` | stored OAuth | GitHub auth token (falls back to `GH_TOKEN`, `GITHUB_TOKEN`) |

## MCP Server (`embacle-mcp`)

A library and standalone binary that exposes embacle runners via the [Model Context Protocol](https://modelcontextprotocol.io/). Connect any MCP-compatible client (Claude Desktop, editors, custom agents) to use all embacle providers.

### Usage

```bash
# Stdio transport (default — for editor/client integration)
embacle-mcp --provider copilot

# HTTP transport (for network-accessible deployments)
embacle-mcp --transport http --host 0.0.0.0 --port 3000 --provider claude_code
```

### MCP Tools

| Tool | Description |
|------|-------------|
| `get_provider` | Get active LLM provider and list available providers |
| `set_provider` | Switch the active provider (`claude_code`, `copilot`, `cursor_agent`, `opencode`, `gemini_cli`, `codex_cli`, `goose_cli`, `cline_cli`, `continue_cli`, `warp_cli`, `kiro_cli`, `kilo_cli`) |
| `get_model` | Get current model and list available models for the active provider |
| `set_model` | Set the model for subsequent requests (pass null to reset to default) |
| `get_multiplex_provider` | Get providers configured for multiplex dispatch |
| `set_multiplex_provider` | Configure providers for fan-out mode |
| `prompt` | Send chat messages to the active provider, or multiplex to all configured providers |

### Client Configuration

Add to your MCP client config (e.g. Claude Desktop `claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "embacle": {
      "command": "embacle-mcp",
      "args": ["--provider", "copilot"]
    }
  }
}
```

## REST API Server (`embacle-server`)

A unified OpenAI-compatible HTTP server with built-in MCP support that proxies requests to embacle runners. Any client that speaks the OpenAI chat completions API or MCP protocol can use it without modification. Supports `--transport stdio` for MCP-only mode (editor integration).

### Usage

```bash
# Start with default provider (copilot) on localhost:3000
embacle-server

# Specify provider and port
embacle-server --provider claude_code --port 8080 --host 0.0.0.0

# MCP-only mode via stdio (for editor/client integration)
embacle-server --transport stdio --provider copilot
```

### Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/chat/completions` | Chat completion (streaming and non-streaming) |
| `GET` | `/v1/models` | List available providers and models |
| `GET` | `/health` | Per-provider readiness check |
| `POST` | `/mcp` | MCP Streamable HTTP (JSON-RPC 2.0) |

### MCP Streamable HTTP

The server also speaks [MCP](https://modelcontextprotocol.io/) at `POST /mcp`, accepting JSON-RPC 2.0 requests. Any MCP-compatible client can connect over HTTP instead of stdio.

| Tool | Description |
|------|-------------|
| `prompt` | Send chat messages to an LLM provider, with optional `model` routing (e.g. `copilot:gpt-4o`) |
| `list_models` | List available providers and the server's default |

```bash
# MCP initialize handshake
curl http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"curl"}}}'

# Call the prompt tool
curl http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"prompt","arguments":{"messages":[{"role":"user","content":"hello"}]}}}'
```

Add `Accept: text/event-stream` to receive SSE-wrapped responses instead of plain JSON.

### Model Routing

The `model` field determines which provider handles the request. Use a `provider:model` prefix to target a specific runner, or pass a bare model name to use the server's default provider.

```bash
# Explicit provider
curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "claude:opus", "messages": [{"role": "user", "content": "hello"}]}'

# Default provider
curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-5.4", "messages": [{"role": "user", "content": "hello"}]}'
```

### Multiplex

Pass an array of models to fan out the same prompt to multiple providers concurrently. Each provider runs in its own task; failures in one don't affect others.

```bash
curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": ["copilot:gpt-4o", "claude:opus"], "messages": [{"role": "user", "content": "hello"}]}'
```

The response uses `object: "chat.completion.multiplex"` with per-provider results and timing.

Streaming is not supported for multiplex requests.

### SSE Streaming

Set `"stream": true` for Server-Sent Events output in OpenAI streaming format (`data: {json}\n\n` with `data: [DONE]` terminator).

### Authentication

Optional. Set `EMBACLE_API_KEY` to require bearer token auth on all endpoints. When unset, all requests are allowed through (localhost development mode). The env var is read per-request, so key rotation doesn't require a restart.

```bash
EMBACLE_API_KEY=my-secret embacle-server
curl http://localhost:3000/v1/models -H "Authorization: Bearer my-secret"
```

## Docker

Pull the image from GitHub Container Registry:

```bash
docker pull ghcr.io/dravr-ai/embacle:latest
```

The image includes `embacle-server` and `embacle-mcp` with Node.js pre-installed for adding CLI backends.

### Adding a CLI Backend

The base image doesn't include CLI tools. Install them in a derived image:

```dockerfile
FROM ghcr.io/dravr-ai/embacle
USER root
RUN npm install -g @anthropic-ai/claude-code
USER embacle
```

Build and run:

```bash
docker build -t my-embacle .
docker run -p 3000:3000 my-embacle --provider claude_code
```

### Auth and Configuration

CLI tools store auth tokens in their config directories. Mount them from the host, or set provider-specific env vars:

```bash
# Mount Claude Code auth from host
docker run -p 3000:3000 \
  -v ~/.claude:/home/embacle/.claude:ro \
  my-embacle --provider claude_code

# Or pass env vars if the CLI supports them
docker run -p 3000:3000 \
  -e GITHUB_TOKEN=ghp_... \
  -e EMBACLE_API_KEY=my-secret \
  my-embacle --provider copilot
```

### Running embacle-mcp

Override the entrypoint to run the MCP server instead:

```bash
docker run --entrypoint embacle-mcp ghcr.io/dravr-ai/embacle --provider copilot
```

## Architecture

```
Your Application
    └── embacle (this library)
            │
            ├── CLI Runners (subprocess per request)
            │   ├── ClaudeCodeRunner    → spawns `claude -p "prompt" --output-format json`
            │   ├── CopilotRunner       → spawns `copilot -p "prompt"`
            │   ├── CursorAgentRunner   → spawns `cursor-agent -p "prompt" --output-format json`
            │   ├── OpenCodeRunner      → spawns `opencode run "prompt" --format json`
            │   ├── GeminiCliRunner     → spawns `gemini -p "prompt" -o json -y`
            │   ├── CodexCliRunner      → spawns `codex exec "prompt" --json --full-auto`
            │   ├── GooseCliRunner      → spawns `goose run --quiet --no-session`
            │   ├── ClineCliRunner      → spawns `cline task --json --act --yolo`
            │   ├── ContinueCliRunner   → spawns `cn -p --format json`
            │   ├── WarpCliRunner       → spawns `oz agent run --prompt "..." --output-format json`
            │   ├── KiroCliRunner       → spawns `kiro-cli send "prompt"`
            │   └── KiloCliRunner       → spawns `kilo run --auto --format json`
            │
            ├── HTTP API Runners (behind feature flag)
            │   └── OpenAiApiRunner       → reqwest to any OpenAI-compatible endpoint
            │
            ├── ACP Runners (persistent connection, behind feature flag)
            │   └── CopilotHeadlessRunner → NDJSON/JSON-RPC to `copilot --acp`
            │
            ├── Provider Decorators (composable wrappers)
            │   ├── FallbackProvider    → ordered chain with retry and exponential backoff
            │   ├── MetricsProvider     → latency, token, and cost tracking
            │   ├── QualityGateProvider → response validation with retry
            │   ├── GuardrailProvider   → pluggable pre/post request validation
            │   └── CacheProvider       → response caching with TTL and capacity
            │
            ├── Agent Loop
            │   └── AgentExecutor       → multi-turn tool calling with configurable max turns
            │
            ├── Structured Output
            │   └── request_structured_output()  → schema-validated JSON extraction with retry
            │
            ├── MCP Tool Bridge
            │   └── McpToolBridge       → MCP tool definitions ↔ text-based tool loop
            │
            ├── MCP Server (library + binary crate)
            │   └── embacle-mcp         → JSON-RPC 2.0 over stdio or HTTP/SSE
            │
            ├── Unified REST API + MCP Server (binary crate)
            │   └── embacle-server      → OpenAI-compatible HTTP, MCP Streamable HTTP, SSE streaming, multiplex
            │
            └── Tool Simulation (text-based tool calling for CLI runners)
                └── execute_with_text_tools()  → catalog injection, XML parsing, tool loop
```

All runners implement the same `LlmProvider` trait:
- **`complete()`** — single-shot completion
- **`complete_stream()`** — streaming completion
- **`health_check()`** — verify the runner is available and authenticated

For detailed API docs — fallback chains, structured output, agent loop, metrics, quality gates, tool simulation, and more — see [docs.rs/embacle](https://docs.rs/embacle).

## Tested With

Embacle has been tested with [mirroir.dev](https://github.com/jfarcand/mirroir-mcp), an MCP server for AI-powered iPhone automation.


## License

Licensed under the Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or <http://www.apache.org/licenses/LICENSE-2.0>).