<div align="center">
<img src="asset/zeph_v8_github.png" alt="Zeph" width="800">
**The AI agent that respects your resources.**
Single binary. Minimal hardware. Maximum context efficiency.<br>
Every token counts — Zeph makes sure none are wasted.
[](https://crates.io/crates/zeph)
[](https://bug-ops.github.io/zeph/)
[](https://github.com/bug-ops/zeph/actions)
[](https://codecov.io/gh/bug-ops/zeph)
[](https://github.com/bug-ops/zeph/security)
[](https://www.rust-lang.org)
[](LICENSE)
</div>
---
## Why Zeph
Most AI agent frameworks dump every tool description, skill, and raw output into the context window — and bill you for it. Zeph takes the opposite approach: **automated context engineering**. Only relevant data enters the context. The result — lower costs, faster responses, and an agent that runs on hardware you already have.
- **Semantic skill selection** — embeds skills as vectors, retrieves only top-K relevant per query instead of injecting all
- **Smart output filtering** — command-aware filters strip 70-99% of noise before context injection; oversized responses offloaded to filesystem
- **Resilient context compaction** — reactive retry on context overflow, middle-out progressive tool response removal, 9-section structured compaction prompt, LLM-free metadata fallback
- **Tool-pair summarization** — when visible tool call/response pairs exceed a configurable cutoff, the oldest pair is summarized via LLM and originals hidden from context
- **Accurate token counting** — tiktoken-based cl100k_base tokenizer with DashMap cache replaces chars/4 heuristic
- **Proportional budget allocation** — context space distributed by purpose, not arrival order
- **Optimized agent loop hot-path** — compaction check is O(1) via cached token count; `EnvironmentContext` built once at bootstrap and partially refreshed on skill reload; doom-loop hashing done in-place with no intermediate allocations; token counting for tool output pruning reduced to a single call per part
## Installation
> [!TIP]
> ```bash
> curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | sh
> ```
<details>
<summary>Other methods</summary>
```bash
cargo install zeph # crates.io
cargo install --git https://github.com/bug-ops/zeph # from source
docker pull ghcr.io/bug-ops/zeph:latest # Docker
```
Pre-built binaries: [GitHub Releases](https://github.com/bug-ops/zeph/releases/latest) · [Docker guide](https://bug-ops.github.io/zeph/guides/docker.html)
</details>
## Quick Start
```bash
zeph init # interactive setup wizard
zeph # run the agent
zeph --tui # run with TUI dashboard
```
[Full setup guide →](https://bug-ops.github.io/zeph/getting-started/installation.html) · [Configuration reference →](https://bug-ops.github.io/zeph/reference/configuration.html)
## Key Features
| | |
|---|---|
| **Hybrid inference** | Ollama, Claude, OpenAI, Candle (GGUF), any OpenAI-compatible API. Multi-model orchestrator with fallback chains. Response cache with blake3 hashing and TTL. Ollama native tool calling via `llm.ollama.tool_use = true`. EMA-based per-provider latency routing (`router_ema_enabled`) with configurable alpha and reorder interval |
| **Skills-first architecture** | YAML+Markdown skill files with semantic matching, self-learning evolution, 4-tier trust model, and compact prompt mode for small-context models. Wilson score Bayesian re-ranking with `posterior_weight`/`posterior_mean`. Auto-promote/demote via `check_trust_transition()`. BM25 hybrid search with RRF fusion (`hybrid_search = true`). Skill health XML attributes (`reliability="N%"`, `uses="N"`) injected into the system prompt. `/skill reject <name> <reason>` command for immediate user-driven feedback |
| **Semantic memory** | SQLite + Qdrant (or embedded SQLite vector search) with MMR re-ranking, temporal decay scoring, resilient compaction (reactive retry, middle-out tool response removal, 9-section structured prompt, LLM-free fallback), durable compaction with message visibility control, tool-pair summarization (LLM-based, configurable cutoff), credential scrubbing, cross-session recall, vector retrieval, autosave assistant responses, snapshot export/import, configurable SQLite pool, background response-cache cleanup, and native `memory_search`/`memory_save` tools the model can invoke explicitly. Implicit correction detection (`FeedbackDetector`) stores `UserCorrection` records in SQLite and Qdrant (`zeph_corrections` collection) for cross-session personalization |
| **Multi-channel I/O** | CLI, Telegram, Discord, Slack, TUI — all with streaming. Vision and speech-to-text input |
| **Protocols** | MCP client (stdio + HTTP), A2A agent-to-agent communication, ACP server for IDE integration (stdio + HTTP+SSE + WebSocket, multi-session with LRU eviction, persistence, idle reaper, permission persistence, multi-modal prompts, runtime model switching, session modes (ask/architect/code), MCP server management via `ext_method`, session export/import, tool call lifecycle notifications, terminal command timeout with kill support, `UserMessageChunk` echo, `ext_notification` passthrough, `list`/`fork`/`resume` sessions behind unstable flags), sub-agent orchestration with zero-trust secret delegation. MCP tools exposed as native `ToolDefinition`s — used via structured tool_use with Claude and OpenAI |
| **Document RAG** | `zeph ingest <path>` ingests `.txt`, `.md`, `.pdf` files into Qdrant `zeph_documents`. When `memory.documents.rag_enabled = true`, the agent automatically retrieves top-K relevant chunks on each turn and prepends them to the context window — no manual `/ingest` needed per session. Also available as `/ingest <path>` in TUI |
| **Anomaly detection** | `AnomalyDetector` monitors tool failure rates in a sliding window. When the failure fraction exceeds `tools.anomaly.failure_threshold`, a Critical alert is raised and the tool is auto-blocked via the trust system — no manual intervention required |
| **Webhook gateway** | `GatewayServer` starts automatically in daemon mode (`--features gateway --daemon`) — axum HTTP server with bearer auth, rate limiting, and `/health` endpoint for webhook ingestion |
| **Defense-in-depth** | Shell sandbox (blocklist + confirmation patterns for process substitution, here-strings, eval, subshell injection via `$(` and backtick), effective command extraction (strips `env`/`exec` prefixes), `ToolPermission` TOML config with per-binary pattern matching (`allow`/`deny`), secret redaction via `redact_json` on all ACP `raw_response` payloads, SSRF protection (HTTPS-only, DNS validation, address pinning, redirect chain re-validation), skill trust quarantine, audit logging. Secrets held in memory as `Zeroizing<String>` — wiped on drop |
| **TUI dashboard** | ratatui-based with syntax highlighting, live metrics, file picker, command palette, daemon mode. Status bar shows filter savings %; command palette includes `/ingest`, `/gateway status`, and `ViewFilters` |
| **Tools** | `FileExecutor`: `read`, `write`, `find_path` (renamed from `glob`), `list_directory`, `create_directory`, `delete_path`, `move_path`, `copy_path` — all paths sandbox-validated, symlink-safe. `WebScrapeExecutor`: `web_scrape` + `fetch` (plain URL-to-text, no selector required). `DiagnosticsExecutor`: runs `cargo check` or `cargo clippy --message-format=json`, returns structured diagnostics with file, line, col, severity, and message — output capped, graceful degradation if cargo absent |
| **Single binary** | ~15 MB, no runtime dependencies, ~50ms startup, ~20 MB idle memory |
[Architecture →](https://bug-ops.github.io/zeph/architecture/overview.html) · [Feature flags →](https://bug-ops.github.io/zeph/reference/feature-flags.html) · [Security model →](https://bug-ops.github.io/zeph/reference/security.html)
## IDE Integration (ACP)
Zeph implements the [Agent Client Protocol](https://agentclientprotocol.com/) — use it as an AI backend in Zed, Helix, VS Code, or any ACP-compatible editor via stdio, HTTP+SSE, or WebSocket transport.
```bash
zeph acp # stdio (editor spawns as subprocess)
zeph acp --http :8080 # HTTP+SSE (shared/remote)
zeph acp --ws :8080 # WebSocket
```
**ACP capabilities:**
- Session modes: `ask`, `code`, `architect` — switch at runtime via `set_session_mode`; editors receive `current_mode_update` notifications
- Tool call lifecycle: `InProgress` → `Completed` updates with `ToolCallContent::Terminal` for shell calls; terminal release deferred until after the `tool_call_update` notification so IDE can display tool output
- Terminal command timeout (default 120 s, configurable via `terminal_timeout_secs`) with `kill_terminal_command` support
- `UserMessageChunk` echo notification after each user prompt
- `ext_notification` passthrough to running sessions
- `AgentCapabilities` advertises `session_capabilities`: `list`, `fork`, `resume`
- MCP HTTP transport support in the MCP bridge
- Unsupported content blocks (Audio, ResourceLink) produce structured log warnings instead of silent drops
- **Usage reporting** — token counts (input, output, cache) streamed back to the IDE as `UsageUpdate` session notifications after each turn; IDEs that support `UsageUpdate` render this as a context percentage badge (`unstable-session-usage`, enabled by default)
- **Loaded rules reporting** — skill paths and system rule files discovered at session start are sent to the IDE via `_meta.projectRules` in `NewSessionResponse`; IDEs that implement this extension display an **N project rules** badge
- **IDE model picker** — `SetSessionModel` lets the editor switch the active model via a native dropdown without a custom `session/configure` call (`unstable-session-model`)
- **Auto session title** — `SessionInfoUpdate` notifies the IDE of an agent-generated session title after the first turn (`unstable-session-info-update`)
- **Plan updates** — `SessionUpdate::Plan` events emitted during orchestrator runs so the editor can display intermediate planning steps
- **Slash commands** — `AvailableCommandsUpdate` advertises built-in slash commands (`/help`, `/model`, `/mode`, `/clear`, `/compact`); user input starting with `/` is dispatched to the matching handler
- **LSP diagnostics injection** — `@diagnostics` mention in a Zed prompt triggers LSP diagnostic context injection, providing the agent with current editor diagnostics
- **Session history** — `GET /sessions` lists persisted sessions with title, timestamp, and message count; `GET /sessions/{id}/messages` returns the full event log; sending an existing `session_id` resumes the conversation from stored context; title auto-inferred from the first user message
- **Subagent nesting** — sub-agent output is nested under the parent tool call in the IDE via `_meta.claudeCode.parentToolUseId` carried on every `session_update`, so multi-agent runs appear as collapsible trees in Zed and VS Code ACP tool cards
- **Terminal streaming** — `AcpShellExecutor` streams bash output in real time via `_meta.terminal_output` chunks with a final `_meta.terminal_exit` event; IDEs display live output inside the tool card as commands execute
- **File following** — `ToolCall.location` carries `filePath` for file read/write operations; the IDE editor cursor tracks the agent across files automatically
- **Native file tools** — `AcpFileExecutor` exposes `list_directory` and `find_path` tools that run on the agent filesystem when the IDE advertises `fs.readTextFile` capability; paths are sandbox-validated, glob segments validated against `..` traversal, results capped at 1000 entries
- **ToolFilter** — suppresses local `FileExecutor` tools (`read`, `write`, `glob`) automatically when `AcpFileExecutor` provides IDE-proxied alternatives, preventing tool duplication in the model's context
- **Permission gate hardening** — subshell injection (`$(`, backticks) blocked before pattern matching; `effective_shell_command()` extracts inner command from `bash -c <cmd>` to prevent args-field bypass; `extract_command_binary()` strips transparent prefixes so "Allow always" for `git` does not auto-approve `rm`; `deny` patterns fast-path to `RejectAlways` without IDE round-trip
> [!NOTE]
> `list_sessions`, `fork_session`, and `resume_session` are gated behind the `unstable` feature flag. `UsageUpdate`, `SetSessionModel`, and `SessionInfoUpdate` are gated behind their respective `unstable-session-usage`, `unstable-session-model`, and `unstable-session-info-update` flags.
### WebSocket transport hardening
The WebSocket transport is hardened against a range of protocol and concurrency issues:
| Property | Value |
|---|---|
| Max concurrent sessions | Configurable; enforced with atomic slot reservation (eliminates TOCTOU race) |
| Keepalive | 30 s ping interval / 90 s pong timeout — idle connections are closed |
| Max message size | 1 MiB |
| Binary frames | Rejected with close code `1003 Unsupported Data` (text-only protocol) |
| Disconnect drain | Write task given 1 s to deliver the RFC 6455 close frame before the socket is dropped |
### Bearer token authentication
Protect the `/acp` (HTTP+SSE) and `/acp/ws` (WebSocket) endpoints with a static bearer token. The discovery endpoint is always exempt.
```toml
# config.toml
[acp]
auth_bearer_token = "your-secret-token"
```
| Method | Value |
|---|---|
| Config key | `acp.auth_bearer_token` |
| Environment variable | `ZEPH_ACP_AUTH_TOKEN` |
| CLI flag | `--acp-auth-token <token>` |
Requests to guarded routes without a valid `Authorization: Bearer <token>` header receive `401 Unauthorized`. When no token is configured, the server runs in open mode — no authentication is enforced. stdio transport is always unaffected.
> [!TIP]
> Always set `auth_bearer_token` when exposing the ACP server on a network interface. For local-only stdio or single-user setups no token is required.
> [!CAUTION]
> Open mode (no token configured) is suitable only for trusted local use. Any process on the same host can issue agent commands without authentication.
### Agent discovery
Zeph publishes a machine-readable agent manifest at `GET /.well-known/acp.json` that ACP-compatible clients use for capability discovery. The manifest includes the agent name, version, supported transports, and authentication type.
```toml
# config.toml
[acp]
discovery_enabled = true # default: true
```
| Method | Value |
|---|---|
| Config key | `acp.discovery_enabled` |
| Environment variable | `ZEPH_ACP_DISCOVERY_ENABLED=false` to disable |
> [!NOTE]
> The discovery endpoint is intentionally unauthenticated so clients can discover the agent and its auth requirements before presenting credentials. Do not include sensitive data in the manifest. Set `discovery_enabled = false` if the endpoint must not be publicly reachable.
[ACP setup guide →](https://bug-ops.github.io/zeph/advanced/acp.html)
## Sub-Agents
Zeph supports spawning sub-agents — isolated agent instances with their own LLM provider, filtered tool access, and injected skills. Sub-agents are defined as Markdown files with TOML frontmatter and loaded from `.zeph/agents/` (project scope) or `~/.config/zeph/agents/` (user scope).
### Definition format
```markdown
+++
name = "code-reviewer"
description = "Reviews code changes for correctness and style"
model = "claude-sonnet-4-20250514"
[tools]
allow = ["shell", "web_scrape"]
[permissions]
network = true
filesystem = "read"
secrets = ["GITHUB_TOKEN"]
ttl_secs = 120
[skills]
include = ["git-*", "rust-*"]
exclude = ["deploy-*"]
+++
You are a code reviewer. Report findings with severity.
```
### CLI commands
| Command | Description |
|---------|-------------|
| `/agent list` | List available sub-agent definitions |
| `/agent spawn <name> <prompt>` | Spawn a foreground sub-agent |
| `/agent bg <name> <prompt>` | Spawn a background sub-agent |
| `/agent status` | Show active sub-agents with state, turns, and elapsed time |
| `/agent cancel <id>` | Cancel a running sub-agent by ID prefix |
| `/agent approve <id>` | Approve a pending secret request |
| `/agent deny <id>` | Deny a pending secret request |
### Configuration
```toml
[agents]
enabled = true
max_concurrent = 4
extra_dirs = ["/path/to/shared/agents"]
```
> [!NOTE]
> Sub-agents are disabled by default. Set `agents.enabled = true` to activate. Each sub-agent receives only explicitly granted tools, skills, and secrets via zero-trust `PermissionGrants`.
## Self-Learning
Zeph continuously improves skill selection and routing based on real usage, without requiring explicit training or model retraining.
### How it works
| Phase | Mechanism | Effect |
|-------|-----------|--------|
| **Feedback capture** | `FailureKind` enum records rejection type; `/skill reject <name> <reason>` triggers immediate skill improvement | Negative signal persisted in `outcome_detail` column (migration 018) |
| **Implicit correction** | `FeedbackDetector` recognizes correction patterns ("actually", "that's wrong", "try again") in user messages | `UserCorrection` stored in SQLite + Qdrant `zeph_corrections` collection |
| **Cross-session recall** | Corrections are retrieved by embedding similarity at context-build time | Agent adapts to user preferences across sessions without user re-stating preferences |
| **Bayesian re-ranking** | Wilson score lower-bound updates `posterior_weight` and `posterior_mean` per skill | Proven skills surface higher; underperforming skills demoted automatically |
| **Trust transitions** | `check_trust_transition()` runs after each outcome | Skills auto-promoted to `Trusted` or demoted to `Quarantined` based on accumulated evidence |
| **Hybrid search** | BM25 keyword score fused with cosine similarity via Reciprocal Rank Fusion | Better recall for exact-match queries; `cosine_weight` controls blend ratio |
| **EMA routing** | `EmaTracker` tracks per-provider exponential moving average latency | Fastest reliable provider gets priority; reordering interval is configurable |
### Configuration
```toml
[skills]
cosine_weight = 0.7 # weight of cosine similarity vs BM25 in hybrid search (default: 0.7)
hybrid_search = true # enable BM25 + cosine RRF fusion (default: true)
[agent.learning]
correction_detection = true # implicit correction detection
correction_confidence_threshold = 0.7 # minimum confidence to record a correction
correction_recall_limit = 5 # max corrections injected into context per turn
correction_min_similarity = 0.75 # minimum vector similarity for correction recall
[llm]
router_ema_enabled = false # enable EMA-based provider routing (default: false)
router_ema_alpha = 0.1 # EMA smoothing factor (default: 0.1)
router_reorder_interval = 60 # seconds between provider reordering (default: 60)
```
### TUI confidence bars
When running with `--tui`, the skills panel shows live confidence bars derived from the Wilson score posterior:
```
web-search [████████░░] 82% (117 uses)
git-commit [███████░░░] 73% (42 uses)
code-review [████░░░░░░] 41% (8 uses)
```
### CLI commands
| Command | Description |
|---------|-------------|
| `/skill reject <name> <reason>` | Record a rejection and trigger immediate skill improvement cycle |
> [!TIP]
> Use `/skill reject` whenever a skill produces a wrong result. The rejection is stored with a typed `FailureKind` and immediately feeds the Bayesian re-ranker — no manual config required.
> [!NOTE]
> Implicit corrections are detected automatically from natural language ("actually, ...", "that's not right"). Explicit `/skill reject` provides stronger signal and is preferred when the failure is clear-cut.
## TUI Demo
<div align="center">
<img src="asset/zeph.gif" alt="Zeph TUI Dashboard" width="800">
</div>
[TUI guide →](https://bug-ops.github.io/zeph/advanced/tui.html)
## Documentation
**[bug-ops.github.io/zeph](https://bug-ops.github.io/zeph/)** — installation, configuration, guides, architecture, and API reference.
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for development workflow and guidelines.
## Security
The workspace enforces `unsafe_code = "deny"` at the Cargo workspace lint level — no unsafe Rust is permitted in any crate without an explicit override.
Found a vulnerability? Please use [GitHub Security Advisories](https://github.com/bug-ops/zeph/security/advisories/new) for responsible disclosure.
## License
[MIT](LICENSE)