zeph 0.14.1

Lightweight AI agent with hybrid inference, skills-first architecture, and multi-channel I/O
zeph-0.14.1 is not a library.

The AI agent that respects your resources.

Single binary. Minimal hardware. Maximum context efficiency.

Crates.io docs CI codecov MSRV License: MIT


Zeph is a Rust AI agent built around one principle: every token in the context window must earn its place. Skills are retrieved semantically, tool output is filtered before injection, and the context compacts automatically under pressure — keeping costs low and responses fast on hardware you already own.

curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | sh
zeph init   # interactive setup wizard
zeph        # start the agent

[!TIP] cargo install zeph also works. Pre-built binaries and Docker images are on the releases page.


What's inside

Feature Description
Hybrid inference Ollama, Claude, OpenAI, any OpenAI-compatible API, or fully local via Candle (GGUF). Multi-model orchestrator with fallback chains, EMA latency routing, and adaptive Thompson Sampling for exploration/exploitation-balanced model selection. → Providers
Skills-first architecture YAML+Markdown skill files with BM25+cosine hybrid retrieval. Bayesian re-ranking, 4-tier trust model, and self-learning evolution — skills improve from real usage. Agent-as-a-Judge feedback detection with adaptive regex/LLM hybrid analysis. The load_skill tool lets the LLM fetch the full body of any skill outside the active TOP-N set on demand. → Skills · → Self-learning
Context engineering Semantic skill selection, command-aware output filters, tool-pair summarization with deferred application (pre-computed eagerly, applied lazily to stabilize the Claude API prompt cache prefix), proactive context compression (reactive + proactive strategies), and reactive middle-out compaction keep the window efficient under any load. Three-tier compaction pipeline: deferred summary application at 70% context usage → pruning at 80% → LLM compaction on overflow. --debug-dump [PATH] writes every LLM request, response, and raw tool output to numbered files for context debugging. → Context · → Debug Dump
Semantic memory SQLite + Qdrant with MMR re-ranking, temporal decay, query-aware memory routing (keyword/semantic/hybrid), cross-session recall, implicit correction detection, and credential scrubbing. Optional graph memory adds entity-relationship tracking with FTS5-accelerated entity search, BFS traversal for multi-hop reasoning, temporal fact tracking, and embedding-based entity resolution for semantic deduplication. Background LLM extraction runs fire-and-forget on each turn; graph facts are injected into the context window alongside semantic recall. → Memory · → Graph Memory
IDE integration (ACP) Stdio, HTTP+SSE, or WebSocket transport. Session modes, live tool streaming, LSP diagnostics injection, file following, usage reporting. Works in Zed, Helix, VS Code. → ACP
Multi-channel I/O CLI, Telegram, TUI dashboard — all with streaming. Voice and vision input supported. → Channels
MCP & A2A MCP client with full tool exposure to the model. Configure mcpls as an MCP server for compiler-level code intelligence: hover, definition, references, diagnostics, call hierarchy, and safe rename via rust-analyzer, pyright, gopls, and 30+ other LSP servers. A2A agent-to-agent protocol for multi-agent orchestration. → MCP · → LSP · → A2A
Sub-agents Spawn isolated agents with scoped tools, skills, and zero-trust secret delegation — defined as Markdown files. 4-level resolution priority (CLI > project > user > config), permission_mode (default/accept_edits/dont_ask/bypass_permissions/plan), fine-grained tools.except denylists, background fire-and-forget execution, max_turns limits, persistent memory scopes (user/project/local) with MEMORY.md injection, persistent JSONL transcript storage with /agent resume for continuing completed sessions, and lifecycle hooks (SubagentStart/SubagentStop at config level, PreToolUse/PostToolUse per agent with pipe-separated matchers). Manage definitions with `zeph agents list
Instruction files Drop zeph.md (or CLAUDE.md / AGENTS.md) in your project root. Zeph auto-detects and injects them into every system prompt — project rules, conventions, and domain knowledge applied automatically. Changes are picked up live via filesystem watching (500 ms debounce) — no restart required. → Instruction Files
Defense-in-depth Shell sandbox, SSRF protection, skill trust quarantine, secret zeroization, audit logging, unsafe_code = "deny" workspace-wide. Untrusted content isolation: all tool results, web scrape output, MCP responses, A2A messages, and memory retrieval pass through a ContentSanitizer pipeline that truncates, strips control characters, detects 17 injection patterns, and wraps content in spotlighting XML delimiters before it enters the LLM context. Optional quarantined summarizer (Dual LLM pattern) routes high-risk sources through an isolated, tool-less LLM extraction call for defense-in-depth against indirect prompt injection. Exfiltration guards block markdown image pixel-tracking, validate tool call URLs against flagged untrusted sources, and suppress memory writes for injection-flagged content. TUI security panel with real-time event feed, SEC status bar indicator, and security:events command palette entry. → Security · → Untrusted Content Isolation
Document RAG zeph ingest <path> indexes .txt, .md, .pdf into Qdrant. Relevant chunks surface automatically on each turn. → Document loaders
Task orchestration DAG-based task graphs with dependency tracking, parallel execution, configurable failure strategies (abort/retry/skip/ask), timeout enforcement, and SQLite persistence. LLM-powered goal decomposition via Planner trait with structured output. Tick-based DagScheduler execution engine with command pattern, AgentRouter trait for task-to-agent routing, cross-task context injection with ContentSanitizer integration, and stale event guards. LLM-backed result aggregation (Aggregator trait, LlmAggregator) synthesizes completed task outputs into a single coherent response with per-task token budgeting and raw-concatenation fallback. /plan CLI commands: /plan <goal> (decompose + confirm + execute), /plan status, /plan list, /plan cancel, /plan confirm, /plan resume [id] (resume a paused Ask-strategy graph), /plan retry [id] (re-run failed tasks). TUI Plan View side panel (press p) shows live task status with per-row spinners and status colors; five plan:* command palette entries available. OrchestrationMetrics tracked in MetricsSnapshot. Enabled by the orchestration feature flag (included in full). → Orchestration
Daemon & scheduler HTTP webhook gateway with bearer auth. Cron-based periodic tasks and one-shot deferred tasks with SQLite persistence — add, update, or cancel tasks at runtime via natural language using the built-in scheduler skill. Background mode. → Daemon
Single binary ~15 MB, no runtime dependencies, ~50 ms startup, ~20 MB idle memory.
┌─ Skills (3/12) ────────────────────┐┌─ MCP Tools ─────────────────────────┐
│  web-search  [████████░░] 82% (117)││  - filesystem/read_file             │
│  git-commit  [███████░░░] 73%  (42)││  - filesystem/write_file            │
│  code-review [████░░░░░░] 41%   (8)││  - github/create_pr                 │
└────────────────────────────────────┘└─────────────────────────────────────┘

Documentation

Full documentation — installation, configuration, guides, and architecture reference — at bug-ops.github.io/zeph.

Contributing

See CONTRIBUTING.md. Found a vulnerability? Use GitHub Security Advisories.

License

MIT