zeph 0.21.2

Lightweight AI agent with hybrid inference, skills-first architecture, and multi-channel I/O
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
---
aliases:
  - Zeph BRD
  - Zeph Business Requirements
tags:
  - brd
  - ai-agent
  - rust
  - status/draft
created: 2026-04-13
project: "Zeph"
status: draft
related:
  - "[[SRS]]"
  - "[[NFR]]"
  - "[[constitution]]"
  - "[[MOC-specs]]"
---

# Zeph: Business Requirements Document

> [!abstract]
> Zeph is a lightweight, open-source Rust AI agent with hybrid multi-provider
> inference, a skills-first architecture, and semantic memory. This BRD defines
> what Zeph is, why it exists, and what success looks like for its primary personas.
> It serves as the business-level input to [[SRS]] and [[NFR]].

## Executive Summary

Zeph is an open-source, self-hostable AI agent written in Rust that integrates
multiple LLM backends (Ollama, Claude, OpenAI, HuggingFace/candle) behind a
unified interface. It gives developers, power users, and teams a programmable,
privacy-respecting agent that runs locally or as a lightweight service, with
no cloud lock-in. Skills-based extensibility lets users teach the agent domain
knowledge without retraining a model. Semantic memory with SQLite and Qdrant
gives the agent long-term recall across sessions. All I/O is handled through
interchangeable channels (CLI, TUI, Telegram), and secrets are managed
exclusively via an age-encrypted vault. Zeph targets pre-v1.0 active
development and is not yet production-hardened for multi-tenant deployments.

---

## Problem Statement

### What problem exists today?

Existing AI agents are predominantly:

1. **Cloud-only** — locked to a single provider's API, data leaves the machine.
2. **Monoglot** — written in Python or TypeScript, making them difficult to
   embed in Rust toolchains or operate with minimal runtime overhead.
3. **Opaque** — behaviour is not auditable; there is no structured way to
   extend the agent with domain knowledge beyond system-prompt hacking.
4. **Memory-poor** — most agents have no durable cross-session memory; every
   conversation starts cold.
5. **Single-channel** — built for one interface (web chat or CLI), not portable
   to Telegram, TUI dashboards, or programmatic APIs.

### Who experiences this problem?

- Rust developers who need an agent embedded in their workflow without a Python
  dependency.
- Privacy-aware power users who cannot send proprietary data to a cloud API.
- Small teams that want a lightweight, self-hosted agent without infrastructure
  complexity.

### What is the impact of not solving it?

Without Zeph, users must choose between heavyweight Python frameworks (LangChain,
AutoGPT), cloud-only products (ChatGPT Plugins, Claude Projects), or writing
agent glue code from scratch — all of which either leak data, require a Python
runtime, or provide no semantic memory or skill governance.

### Current workarounds

- Pasting context manually into each conversation.
- Shell scripts wrapping raw `curl` calls to OpenAI.
- Python wrappers (LangChain) with substantial runtime overhead.
- Local Ollama with no skill management or memory.

> [!warning] Assumptions
> - The primary deployment target is a developer's local machine or a small
>   VPS, not a multi-tenant SaaS platform.
> - Users are comfortable with TOML configuration and a terminal interface.
> - The age vault is the only accepted secret storage; `.env` files are out
>   of scope.

---

## Target Users

### Primary Users

| Persona | Description | Primary Goal | Key Pain Point |
|---------|-------------|-------------|----------------|
| **CLI Developer** | Rust/systems developer using Zeph as a daily coding assistant | Integrate an agent into shell workflows, pipe output, use slash commands | Existing agents require Python or cloud APIs |
| **Power User (TUI)** | Technical user running Zeph full-time in a terminal TUI | Monitor agent state, context, metrics, and memory in real time | No visibility into what the agent is doing |
| **Remote User (Telegram)** | Developer or team member accessing Zeph via a Telegram bot | Use the agent from mobile or when away from the terminal | CLI-only agents are inaccessible from mobile |

### Secondary Users

| Persona | Description | Primary Goal |
|---------|-------------|-------------|
| **Team Operator** | DevOps / infrastructure engineer deploying Zeph as a shared service | Expose Zeph via HTTP gateway, schedule tasks, monitor Prometheus metrics |
| **Skill Author** | Developer writing SKILL.md files to extend agent behaviour | Teach the agent domain knowledge without touching Rust code |
| **Benchmark Researcher** | ML engineer running Zeph against standard agent benchmarks | Compare memory, skill, and reasoning quality across providers |

### Stakeholders

- **Open-source contributors** — pull quality, project momentum, community trust.
- **Anthropic / OpenAI / Ollama ecosystems** — compatibility with their APIs
  is a dependency, not a requirement to control.

---

## Functional Requirements

> [!tip] Priority Legend
> - **Must** — without this the system is pointless (MVP)
> - **Should** — important but can ship without it
> - **Could** — nice to have

### Agent Core

- **FR-001**: As a CLI developer, I need the agent to receive messages, call
  LLMs, execute tools, and return responses in a single coherent turn loop,
  so that I can have a productive conversation with the agent.
  - *Acceptance criteria*: a user message results in an LLM response within the
    same terminal session; tool calls are dispatched and results appended before
    the final reply.
  - *Priority*: Must

- **FR-002**: As any user, I need the agent to support slash commands
  (`/help`, `/clear`, `/compact`, `/plan`, `/exit`), so that I can control
  agent behaviour without leaving the current channel.
  - *Acceptance criteria*: typing `/help` prints available commands; `/clear`
    resets conversation; `/compact` triggers context compaction.
  - *Priority*: Must

- **FR-003**: As a power user, I need the agent to swap the active LLM provider
  at runtime without restarting, so that I can switch between local and cloud
  models mid-session.
  - *Acceptance criteria*: provider swap via config hot-reload completes without
    dropping the current conversation history.
  - *Priority*: Should

### Multi-Provider LLM Inference

- **FR-010**: As a developer, I need Zeph to support Ollama, Claude (Anthropic),
  OpenAI, OpenAI-compatible, and HuggingFace/candle providers behind a unified
  interface, so that I am not locked to a single vendor.
  - *Acceptance criteria*: each provider passes the standard chat, streaming,
    and tool-call test suite.
  - *Priority*: Must

- **FR-011**: As a team operator, I need a provider pool with routing strategies
  (cascade, cost-weighted, bandit, complexity-tiered), so that expensive models
  are used only for complex tasks.
  - *Acceptance criteria*: a configured `TriageRouter` sends simple queries to
    the fast provider and complex queries to the quality provider, measurable
    via debug logs.
  - *Priority*: Should

- **FR-012**: As any user, I need the agent to support prompt caching where
  the provider supports it, so that repeated system prompts do not incur
  unnecessary token costs.
  - *Priority*: Could

### Skills System

- **FR-020**: As a skill author, I need to define agent skills in plain
  `SKILL.md` files that are loaded and hot-reloaded without restarting the agent,
  so that domain knowledge can be added or updated at runtime.
  - *Acceptance criteria*: adding or editing a SKILL.md file is reflected in the
    active registry within 500ms debounce.
  - *Priority*: Must

- **FR-021**: As a developer, I need skills to be matched to user messages via
  hybrid BM25 + embedding scoring with a configurable disambiguation threshold,
  so that irrelevant skills are not injected.
  - *Acceptance criteria*: with a threshold set to 0.7, a message with < 0.7
    score against all skills results in no skill injection.
  - *Priority*: Must

- **FR-022**: As a skill author, I need a self-learning loop that upgrades skill
  trust scores based on positive/negative feedback signals, so that well-performing
  skills are preferred automatically.
  - *Priority*: Should

### Semantic Memory

- **FR-030**: As any user, I need the agent to persist conversation history in
  SQLite and retrieve semantically similar memories from Qdrant across sessions,
  so that the agent remembers relevant context from past interactions.
  - *Acceptance criteria*: querying the agent about a topic discussed in a
    previous session returns a contextually relevant recalled memory.
  - *Priority*: Must

- **FR-031**: As a developer, I need the agent to detect rising context pressure
  and compact conversation history (soft threshold at ~60%, hard at ~90%),
  so that long sessions do not exhaust the model's context window.
  - *Acceptance criteria*: at 90% context utilisation, compaction fires and
    context length drops below 60%.
  - *Priority*: Must

- **FR-032**: As any user, I need the agent to maintain an entity graph
  (MAGMA typed edges) for BFS-based graph recall, so that factual relationships
  extracted from conversations are reused in future turns.
  - *Priority*: Should

### Multi-Channel I/O

- **FR-040**: As a CLI developer, I need a text-based CLI channel that reads
  from stdin and writes to stdout, so that Zeph can be used in shell pipelines.
  - *Priority*: Must

- **FR-041**: As a power user, I need a ratatui-based TUI channel with real-time
  metrics, context pressure gauge, memory panel, and spinner indicators for all
  background operations, so that I have full situational awareness during a session.
  - *Priority*: Should

- **FR-042**: As a remote user, I need a Telegram channel with streaming output
  support, so that I can interact with Zeph from mobile without a terminal.
  - *Priority*: Should

### Tool Execution

- **FR-050**: As a developer, I need the agent to execute shell commands,
  web scraping, and file operations via a composable tool executor, with a
  blocklist check and optional user-approval gate, so that I control what the
  agent can do autonomously.
  - *Acceptance criteria*: a blocklisted command is rejected before the
    permission policy is consulted; a non-blocklisted command in the "ask first"
    set prompts the user for confirmation.
  - *Priority*: Must

- **FR-051**: As a team operator, I need tool execution audit logging with
  `claim_source` attribution, so that every tool call is traceable to its
  origin.
  - *Priority*: Should

### MCP Integration

- **FR-060**: As a developer, I need Zeph to act as an MCP client connecting
  to one or more MCP servers, discovering their tools semantically, and invoking
  them transparently alongside native tools, so that I can extend Zeph's
  capabilities via the MCP ecosystem.
  - *Priority*: Should

- **FR-061**: As a team operator, I need per-server tool quotas and structured
  error codes from MCP tool calls, so that runaway MCP servers cannot consume
  unlimited resources.
  - *Priority*: Could

### A2A and ACP Protocols

- **FR-070**: As a developer, I need Zeph to implement the A2A (Agent-to-Agent)
  JSON-RPC 2.0 protocol for agent discovery and invocation, so that Zeph can
  participate in multi-agent networks.
  - *Priority*: Could

- **FR-071**: As a developer, I need ACP (Agent Control Protocol) transport
  support with session management and capability advertisement, so that Zeph
  can be controlled or forked by another agent.
  - *Priority*: Could

### Vault and Secrets

- **FR-080**: As any user, I need all secrets (API keys, tokens) to be stored
  exclusively in an age-encrypted vault, never in environment variables or TOML
  config files, so that secrets are not leaked in configuration or logs.
  - *Acceptance criteria*: attempting to set `ZEPH_OPENAI_API_KEY` via env var
    is ignored; the key is resolved only from the age vault.
  - *Priority*: Must

### Gateway and Scheduler

- **FR-090**: As a team operator, I need an HTTP gateway with bearer-token
  authentication for webhook ingestion, so that external systems can send
  events to Zeph without direct terminal access.
  - *Priority*: Could

- **FR-091**: As a team operator, I need a cron-based task scheduler with
  SQLite persistence and CLI management (`schedule list/add/remove`), so that
  periodic agent tasks run unattended.
  - *Priority*: Could

### Code Indexing

- **FR-100**: As a CLI developer, I need AST-based code indexing with semantic
  retrieval and repo-map generation, so that the agent can answer questions
  about the current codebase without manual context injection.
  - *Priority*: Could

### Subagent Lifecycle

- **FR-110**: As a developer, I need to spawn named subagents with scoped tool
  permissions, TTL-based grants, and JSONL transcript persistence via `/agent spawn`,
  so that complex tasks can be delegated to isolated agent instances.
  - *Priority*: Could

---

## Non-Functional Requirements

Detailed targets are in [[NFR]]. High-level constraints for business context:

### Performance

- The CLI channel round-trip (user message → LLM → response displayed) must
  complete within the LLM provider's own latency; no significant overhead added
  by Zeph's routing and memory pipeline.
- The release binary must stay under 15 MiB (current constraint from constitution).

### Security & Privacy

- All secrets managed via age vault; no plaintext credentials anywhere in the
  system.
- Shell command execution protected by a blocklist that runs unconditionally
  before permission policy.
- SSRF protection: private IP ranges rejected in web tool.
- PII detection and redaction in the sanitizer pipeline.

### Availability

- Designed for single-user / small-team use; no high-availability or multi-tenant
  SLA targets in pre-v1.0.
- Graceful degradation: agent operates without memory (no Qdrant) or without
  MCP servers.

### Usability

- All background operations in TUI must have a visible spinner with a descriptive
  status message — no silent background work.
- CLI ergonomics: `/help` lists all available slash commands.

---

## Scope & Boundaries

### In Scope

- Single-binary Rust agent with CLI, TUI, and Telegram channels.
- Multi-provider LLM routing with cost and complexity awareness.
- SKILL.md-based skill system with hot-reload and self-learning.
- Dual-backend memory (SQLite + Qdrant) with graph recall.
- Tool execution with shell, web scraping, and file operations.
- MCP client for third-party tool servers.
- A2A and ACP protocol support (feature-gated).
- Age-encrypted vault for all secrets.
- Optional HTTP gateway, cron scheduler, code indexer, benchmark harness.
- Prometheus metrics export (feature-gated with gateway).
- PostgreSQL database backend (feature-gated alternative to SQLite).

### Out of Scope

> [!danger] Explicit Exclusions
>
> - **Multi-tenant SaaS platform**: Zeph is not a hosted service; no
>   user accounts, billing, or tenant isolation.
> - **Web UI**: there is no browser-based interface; CLI, TUI, and Telegram
>   are the only channels.
> - **Windows support**: the primary supported platforms are macOS and Linux.
> - **Model training or fine-tuning**: Zeph does not train models; it only
>   infers.
> - **Python or Node.js runtime**: Zeph is a single Rust binary; no polyglot
>   runtime dependencies.
> - **Backward compatibility shims before v1.0**: breaking changes are
>   documented in CHANGELOG.md without deprecation warnings.
> - **OpenSSL**: `openssl-sys` is banned; rustls is the only TLS stack.

---

## Integrations & Dependencies

| System | Direction | Data | Status |
|--------|-----------|------|--------|
| Ollama (local) | Outbound | Chat completions, embeddings | Exists |
| Anthropic Claude API | Outbound | Chat completions, tool calls | Exists |
| OpenAI API | Outbound | Chat, tools, embeddings | Exists |
| OpenAI-compatible APIs | Outbound | Chat, embeddings | Exists |
| HuggingFace / candle | Outbound | Local embeddings, inference | Exists |
| Qdrant (local/remote) | Both | Vector store: embeddings, recall | Exists |
| SQLite (embedded) | Both | Conversation history, scheduler, experiments | Exists |
| PostgreSQL (optional) | Both | Alternative to SQLite for teams | Feature-gated |
| Telegram Bot API | Both | Inbound messages, outbound replies | Exists |
| MCP servers (any) | Both | Tool discovery, tool calls | Exists |
| A2A peers | Both | JSON-RPC 2.0 agent invocation | Feature-gated |
| ACP clients | Both | ACP session management | Feature-gated |
| age (encryption) | Both | Vault secret encryption/decryption | Exists |
| Prometheus / OpenMetrics | Outbound | Metrics scraping | Feature-gated |
| OTLP / Jaeger | Outbound | Distributed traces | Feature-gated |
| Pyroscope | Outbound | Continuous profiling | Feature-gated |

---

## Constraints & Assumptions

### Technical Constraints

- Language: Rust 1.94 (MSRV), Edition 2024, no `unsafe` blocks.
- Async: tokio; no `async-trait` crate in library crates.
- TLS: rustls only; `openssl-sys` banned.
- YAML: `serde_norway` only; `serde_yaml` / `serde_yml` banned.
- Database: SQLite (default) or PostgreSQL (opt-in); `sqlx::Any` banned.
- Feature flags: `default = []`; always-on capabilities compiled without flags.
- Binary size: release binary must stay under 15 MiB.
- Unsafe code: `unsafe_code = "deny"` workspace-wide.

### Business Constraints

- Open-source project; no commercial license or paid support tier.
- Pre-v1.0: no backward-compatibility obligation; breaking changes documented.
- No dedicated infrastructure budget; the age vault is the only secret store.
- No fixed team size or deadline; development is community-driven.

> [!warning] Assumptions
> - Users accept that pre-v1.0 releases may have breaking configuration changes.
> - Ollama or at least one cloud provider API key is available for LLM inference.
> - Qdrant is optional; the agent degrades gracefully to SQLite-only memory.
> - Skills are authored by users as SKILL.md files; there is no GUI skill editor.

---

## Success Criteria

- [ ] A developer can install a single binary and start a productive CLI
      session with a local Ollama model within 5 minutes.
- [ ] Skills added as SKILL.md files are active within 500ms without restarting
      the agent.
- [ ] A message discussed in session N is recalled in session N+1 via semantic
      memory with no user-side configuration beyond enabling Qdrant.
- [ ] The TUI shows a spinner for every background operation; no silent waits.
- [ ] All secrets are resolved from the age vault at startup; zero plaintext
      credentials appear in logs or TOML.
- [ ] The release binary stays under 15 MiB.
- [ ] `cargo nextest run --workspace --features full` passes with zero test
      failures on every commit.
- [ ] A team operator can expose Zeph's metrics to Prometheus and receive
      ~25 gauge/counter metrics without code changes.

---

## Open Questions

> [!question] Unresolved Items
>
> - [ ] What is the target v1.0 feature freeze milestone and date?
> - [ ] Is Windows support ever in scope, or permanently out of scope?
> - [ ] Should Zeph publish crates to crates.io, or remain a binary-only
>       distribution?
> - [ ] Will ACP and A2A remain feature-gated in v1.0, or become always-on?
> - [ ] Is there a plan for user documentation (mdBook) alongside the inline
>       rustdoc?

---

## Glossary

| Term | Definition |
|------|-----------|
| Agent | The Zeph runtime instance that handles a user session end-to-end |
| Channel | The I/O boundary abstraction (CLI, TUI, Telegram) |
| Skill | A SKILL.md file defining domain knowledge injected into the system prompt |
| Provider | An LLM backend (Ollama, Claude, OpenAI, candle) |
| Vault | The age-encrypted secret store managed by the `zeph-vault` crate |
| Compaction | The process of summarising old messages to reduce context size |
| MCP | Model Context Protocol — a standard for LLM tool servers |
| A2A | Agent-to-Agent — JSON-RPC 2.0 protocol for inter-agent calls |
| ACP | Agent Control Protocol — session-oriented agent control transport |
| SKILL.md | A Markdown file describing a skill's trigger patterns and instructions |
| BM25 | Sparse text ranking algorithm used in skill matching |
| Qdrant | Open-source vector database used for semantic memory recall |
| age | A modern file encryption tool used for the Zeph vault backend |
| EARS | Easy Approach to Requirements Syntax (WHEN…SHALL notation) |
| TUI | Terminal User Interface (ratatui-based dashboard) |
| DAG | Directed Acyclic Graph — used for multi-step orchestration |

---

## See Also

- [[SRS]] — functional requirements derived from this BRD
- [[NFR]] — non-functional / quality requirements
- [[constitution]] — project-wide non-negotiable principles
- [[MOC-specs]] — index of all technical specifications
- [[001-system-invariants/spec]] — architectural invariants