zeph 0.21.2

Lightweight AI agent with hybrid inference, skills-first architecture, and multi-channel I/O
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
---
aliases:
  - Agent Loop
  - Turn Lifecycle
  - Context Pressure
tags:
  - sdd
  - spec
  - core
  - agent
  - contract
created: 2026-04-08
status: approved
related:
  - "[[MOC-specs]]"
  - "[[001-system-invariants/spec]]"
  - "[[003-llm-providers/spec]]"
  - "[[004-memory/spec]]"
---

# Spec: Agent Loop

> [!info]
> Agent main loop, turn lifecycle, context pressure management, and HiAgent subgoal-aware compaction.
> See [[001-system-invariants/spec#2. Agent Loop Contract]] for invariants.

## Sources

### External
- **Context Engineering in Manus** (Oct 2025) — soft/hard compaction stages, schema-based summarization: https://rlancemartin.github.io/2025/10/15/manus/
- **ACON** (ICLR 2026) — failure-driven compression guidelines, 26–54% token reduction: https://arxiv.org/abs/2510.00615
- **Effective Context Engineering** (Anthropic, 2025) — just-in-time retrieval, tool output overflow: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- **Efficient Context Management** (JetBrains Research, Dec 2025) — observation masking vs. summarization: https://blog.jetbrains.com/research/2025/12/efficient-context-management/
- **Claude Context Management & Compaction API** (Anthropic, 2026): https://platform.claude.com/docs/en/build-with-claude/context-management

### Internal
| File | Contents |
|---|---|
| `crates/zeph-core/src/agent/mod.rs` | `Agent<C>`, `run()`, `process_user_message()`, sub-state structs |
| `crates/zeph-core/src/agent/feedback_detector.rs` | `FeedbackDetector`, `CorrectionSignal` |
| `crates/zeph-core/src/agent/error.rs` | `AgentError` typed hierarchy |
| `crates/zeph-core/src/channel.rs` | `Channel` trait, `ChannelError` |

---

`crates/zeph-core/src/agent/mod.rs` — the single execution context per session.

## Core Structure

```
Agent<C: Channel> {
    provider: AnyProvider,           // LLM backend, swappable at runtime
    channel: C,                      // I/O boundary, owned
    tool_executor: Arc<dyn ErasedToolExecutor>,
    // All sub-state in dedicated structs — no loose fields:
    msg: MessageState,               // messages vec, message_queue, system prompt
    memory_state: MemoryState,
    skill_state: SkillState,
    tool_state: ToolState,
    security: SecurityState,
    mcp: McpState,
    index: IndexState,
    debug_state: DebugState,
    runtime: RuntimeConfig,
    // + ExperimentState, FeedbackState, InstructionState, LifecycleState,
    //   MetricsState, OrchestrationState, ProviderState, SessionState, ...
}
```

## Turn Lifecycle (invariant order)

1. **Drain message queue** — process any `QueuedMessage` before reading channel
2. **`tokio::select!`** — race between:
   - `channel.recv()` — user message
   - skill reload event
   - instruction reload event
   - config reload event
   - scheduled task fire
3. **Builtin command check**`/exit`, `/clear`, `/compact`, `/plan`, etc. short-circuit; return `Some(bool)` to continue/exit
4. **`process_user_message()`** — main LLM round-trip:
   a. Inject active skills into system prompt
   b. Recall from memory (semantic + code context + graph)
   c. Build context, apply deferred tool pair summaries
   d. Send to LLM provider
   e. Parse response: text / tool calls / thinking blocks
   f. Execute tool calls (confirmation gate if required)
   g. Store turn in memory
   h. Emit response to channel

## Key Invariants

- **System message is always `messages[0]`** — rebuilt each turn from config + skills + instructions
- **Thinking blocks are forwarded verbatim** to the next request — never stripped or summarized
- **Provider can be swapped at runtime** via `provider_override` without restarting the agent
- **Hot-reload events** (skills, instructions, config) are processed between turns, never mid-turn
- **Message queue takes priority** over channel recv — injected messages run before user input
- **Context prep timeout** (`[timeouts] context_prep_timeout_secs`, default 30 s): `advance_context_lifecycle` wrapped with wall-clock timeout; turn proceeds with degraded cached context on expiry instead of stalling (#3357, #3373)
- **NoProviders backoff** (`[timeouts] no_providers_backoff_secs`, default 2 s): after `NoProviders` error the agent records failure timestamp and sleeps; context preparation is skipped on the next turn if still within the backoff window, preventing busy-wait (#3357, #3373)

## Context Pressure Management

- Token counting via `tiktoken-rs` against provider's `context_window()`
- **Soft threshold (~60%)**: apply deferred tool pair summaries
- **Hard threshold (~90%)**: run full compaction (summarize old turns, evict by Ebbinghaus policy)
- Compaction result stored as `MessagePart::Compaction` — never removed from history

## Error Handling

- `AgentError` typed error hierarchy (thiserror)
- LLM errors: transient (retry with backoff) vs permanent (surface to user)
- Tool errors: `ToolError::kind()``Transient` / `Permanent`
- Channel errors abort the current turn but do not exit the loop (unless `ChannelError::Fatal`)

---

## HiAgent Subgoal-Aware Compaction

`crates/zeph-core/src/agent/compaction_strategy.rs`, `crates/zeph-core/src/agent/mod.rs`. Issue #2022.

### Overview

HiAgent-inspired pruning strategies (`subgoal` and `subgoal_mig`) track the agent's current subgoal via fire-and-forget LLM extraction and partition tool outputs into three eviction tiers. This preserves active working context across hard compaction events while aggressively evicting stale outputs from completed or abandoned subgoals.

### Eviction Tiers

| Tier | Relevance Score | Description |
|---|---|---|
| Active | 1.0 | Currently-being-worked subgoal — never evicted by scoring |
| Completed | 0.3 | Finished subgoal — candidate for summarization |
| Outdated | 0.1 | Before any subgoal or between completed subgoals — highest priority for eviction |

### `SubgoalRegistry`

In-memory data structure with:
- `subgoals`: list of tracked subgoals, each with `SubgoalState (Active|Completed)` and message span `[start, end)`
- `extend_active(new_msgs)`: incremental O(new_msgs) update; on first subgoal creation, retroactively tags pre-extraction messages (S4 fix)
- `rebuild_after_compaction(offset)`: repairs index maps after drain/reinsert — uses offset arithmetic, not fragile index assumptions (S1 fix)
- `active_subgoal()`: returns the current active subgoal for `/status` display
- `subgoal_state(msg_index)`: returns tier for scoring

### Subgoal Lifecycle

`maybe_refresh_subgoal()` two-phase fire-and-forget:
1. Uses last 6 agent-visible messages as context (M2 fix)
2. LLM extracts current subgoal description
3. If LLM returns `COMPLETED:` signal → current Active subgoal transitions to Completed (S3 fix)
4. New subgoal auto-completes any existing Active subgoal as defense-in-depth (M3 fix)

### Compaction Integration

`compact_context()` with `subgoal`/`subgoal_mig` strategies:
1. Extracts active-subgoal messages before drain
2. Runs standard compaction (drain + summarize)
3. Re-inserts active-subgoal messages after pinned messages (S2 fix)
4. Index repair after `apply_deferred_summaries` insertions (S5 fix)

### `subgoal_mig` Variant

Combines subgoal tier relevance with MIG (Marginal Information Gain) pairwise redundancy scoring:
`score = subgoal_relevance − max_redundancy_with_any_higher_scored_block`

Active subgoal messages (tier 1.0) have their MIG reduction capped so they are never evicted.

### Constraints

- `subgoal` and `SideQuest` eviction strategies are **mutually exclusive** — hard startup error if both enabled
- Config: `pruning_strategy = "subgoal"` or `"subgoal_mig"` in `[memory.compression]`

### Debug Output

`{N}-subgoal-registry.txt` written at pruning time when `--debug-dump` is active. `/status` shows active subgoal description when strategy is `subgoal` or `subgoal_mig`.

### Key Invariants

- Subgoal extraction is always fire-and-forget — never block the agent turn on subgoal LLM call
- Active subgoal messages are extracted before compaction drain and re-inserted after — never lost in compaction
- `rebuild_after_compaction` uses offset arithmetic (not index scanning) — never recalculate by iterating messages
- Index repair must run after `apply_deferred_summaries` insertions — deferred summaries can shift indices
- `subgoal` and `SideQuest` strategies must never be active simultaneously — hard error at startup
- NEVER evict Active-tier messages by scoring — their relevance is 1.0 (protected)
- NEVER run subgoal extraction synchronously in the tool loop — only between turns

## Focus Strategy Auto-Consolidation

`run_focus_auto_consolidation` (#3313, #3388): when the Focus compression strategy is active,
a periodic auto-consolidation pass merges similar focus segments to reduce fragmentation.

- Controlled by `[memory.focus] auto_consolidate_min_window` (default 4 turns); `0` is rejected at
  startup validation — `Config::validate()` rejects zero with a clear error (#3387, #3392)
- `FocusState::should_auto_consolidate()` returns `false` until the configured number of turns has elapsed
- The O(K²) pairwise MIG scoring loop is offloaded to `tokio::task::spawn_blocking` to prevent
  stalling the async executor on long sessions (#3386, #3398)

### Key Invariants

- `auto_consolidate_min_window = 0` MUST be rejected at config validation — it would trigger LLM on every compress call
- Auto-consolidation runs on `spawn_blocking` — never on the async executor thread pool
- Consolidation is guarded by turn count; no-op until the window has elapsed

## Provider Preference Persistence

Provider preference per channel is persisted to SQLite (#3308, #3385):

- Last-used provider (set via `/provider <name>`) saved after each successful switch
- Restored automatically on next session start
- Identity keyed by `(channel_type, channel_id)`; CLI/TUI use `channel_id = ""`
- Controlled by `[session] provider_persistence = true` (default enabled)
- Migrations: SQLite `079_channel_preferences.sql`, Postgres `075_channel_preferences.sql`

### Key Invariants

- Provider preference restore is best-effort — if the stored provider name no longer exists, fall back silently to the default
- NEVER block session startup on preference load failure

## Compaction Progress UX

`MetricsSnapshot` gains four fields for compaction observability (#3314, #3385):

| Field | Type | Meaning |
|---|---|---|
| `context_max_tokens` | `u64` | Effective context window for the active provider |
| `compaction_last_before` | `u64` | Token count before the last hard compaction |
| `compaction_last_after` | `u64` | Token count after the last hard compaction |
| `compaction_last_at_ms` | `u64` | Wall-clock timestamp of the last compaction (0 = never) |

- `Agent::publish_context_budget()` resolves effective context window from `context_manager.budget.max_tokens()` and publishes to `MetricsSnapshot` after provider pool construction and on every `/provider` switch
- `INFO` log (`tokens_before`, `tokens_after`, `saved`) and transient `send_status("Compacting: {b}→{a} tokens")` emitted after each successful hard compaction
- TUI: `context_gauge` widget (color-coded: green < 70%, yellow 70–90%, red > 90%); hidden when `context_max_tokens == 0`
- TUI: `compaction_badge` widget shows `"{before}k→{after}k (-{saved}k) {elapsed}"`; hidden until first compaction this session

## Hard Compaction Post-Processing: Orphaned `tool_result` Strip

After hard compaction the message list is drained and rebuilt. A `tool_result` message
references a prior `tool_use` by id. When the drain removes the originating `tool_use`
(e.g., the turn that produced it was summarized or evicted) the `tool_result` becomes
**orphaned** — its reference id points to a message no longer in the list. Sending an
orphaned `tool_result` to any provider causes a request validation error (Claude: 400,
OpenAI: 422).

Fix (#3256): after `apply_hard_compaction()` and after any `apply_deferred_summaries()`
step, run `strip_orphaned_tool_results(messages)`:

```
strip_orphaned_tool_results(messages: &mut Vec<Message>)
    collect_set of all tool_use ids present in messages
    remove any message where role == tool_result AND tool_use_id NOT IN that set
```

### Key Invariants

- `strip_orphaned_tool_results` runs after EVERY hard compaction event — no exceptions
- The strip runs AFTER `apply_deferred_summaries` (deferred insertions may add new tool_use messages)
- Removing an orphaned `tool_result` is silent (no WARN) unless `--debug-dump` is active
- This is a correctness invariant, not a heuristic — a single orphaned `tool_result` causes a provider 400/422 error
- NEVER send a `tool_result` whose `tool_use_id` is absent from the message list

---

## Goal Lifecycle (#3567)

The agent tracks a per-session *goal state* that reflects whether the current user
intent has been stated, is in progress, or has been completed. This is distinct from
the orchestration `TaskGraph` goal (which is a planned multi-step execution) — goal
lifecycle tracks the natural-language objective expressed by the user in conversation.

### GoalState Machine

```
Idle ──(user message with goal)──► Active(goal_text)
Active ──(agent signals completion)──► Completed(goal_text)
Completed ──(new user message)──► Active(new_goal_text)
Active ──(/clear or session reset)──► Idle
```

`GoalState` is stored on `LifecycleState`. The active goal text is made available
as a template variable in the system prompt (`{current_goal}`) when configured.

### Goal Completion Detection

The agent detects goal completion via a lightweight heuristic:

1. If the last assistant response contains a completion signal phrase (configurable
   pattern list, e.g., "task complete", "done", "finished") and no tool calls were
   emitted in that turn → transition `Active → Completed`
2. If the orchestration `TaskGraph` plan completes → `Active → Completed`
3. Explicit `/done` slash command → `Active → Completed`

Completion transitions emit a `GoalCompleted` event to the channel (displayed as a
status message, not a user-facing message).

### Config

```toml
[agent.goal]
enabled                  = true
track_in_system_prompt   = false      # inject {current_goal} into system prompt
completion_phrases       = ["task complete", "done", "finished", "completed"]
```

### Key Invariants

- Goal lifecycle is informational — it does NOT block tool execution or LLM calls
- NEVER surface `GoalState` to the LLM directly; it is agent-internal and operator-visible only
- The goal text is extracted from the first user message of the conversation; subsequent messages extend or replace the active goal heuristically

---

## TACO Output Compression (#3591)

TACO (Tool-output Automatic Compression and Offload) compresses large tool outputs
before they are injected into the context window. This is a targeted pre-injection
pass, distinct from the turn-level compaction that runs at the 60/90% pressure gates.

### When It Fires

TACO is evaluated after each tool call result is received, before the result is
appended to `messages`:

1. Measure the raw tool output token count via `tiktoken-rs`
2. If `token_count > taco_threshold` AND the tool is in the compressible-tool set
   → run TACO compression
3. Compressed result replaces the raw result in `messages`

### Compression Strategy

TACO compression uses a fast prompt to summarize the tool output:

```
System: You are a concise tool-output summarizer. Preserve all data values,
file paths, exit codes, and structured content. Remove verbose headers and
repeated patterns. Target: under {target_tokens} tokens.
Tool output:
{raw_output}
```

The compressed result is tagged with `MessagePart::TacoCompressed` so the TUI and
audit log can distinguish it from raw output.

### Compressible Tool Set

Default: `["shell", "web_scrape", "read"]`. Configurable via
`[tools.taco] compressible_tools`. MCP tools are excluded from TACO by default
because their structured output schema is unknown.

### Config

```toml
[tools.taco]
enabled             = false           # default off (opt-in)
taco_threshold      = 2000            # tokens; compress outputs above this
target_tokens       = 500             # target compressed size
taco_provider       = ""              # [[llm.providers]] name; empty = primary
compressible_tools  = ["shell", "web_scrape", "read"]
```

### Key Invariants

- TACO fires only on output that exceeds `taco_threshold`; short outputs are passed through untouched
- On compression failure (provider error, timeout) the **raw output is used** — TACO is best-effort
- NEVER compress `tool_result` messages from `execute_tool_call_confirmed` (fenced-block path) — user-approved results must not be silently summarized
- NEVER apply TACO to thinking blocks or system prompt parts
- `taco_provider` is resolved via the provider registry at runtime; empty = primary provider
- Compressed results carry `MessagePart::TacoCompressed` to make compression auditable

---

## Per-Turn ExecutionContext (#3589)

`ShellExecutor` now receives a per-turn `ExecutionContext` that carries the resolved
working directory and environment overrides for that specific turn. This replaces the
previous model where the working directory was a global field on `ShellExecutor`.

### Contents

```rust
pub struct ExecutionContext {
    pub cwd:     PathBuf,             // resolved working directory for this turn
    pub env:     HashMap<String, String>,  // turn-scoped env overrides (e.g., from hooks)
    pub session: SessionId,           // for audit correlation
}
```

### Propagation

`ExecutionContext` is constructed at the start of `process_user_message()` from the
current `LifecycleState::cwd` and any active hook-injected env vars. It is passed to
`ShellExecutor::execute_with_context(&call, &ctx)` instead of reading from a shared
field.

### Key Invariants

- The `cwd` in `ExecutionContext` reflects the working directory **as of the start of the turn** — changes made by `set_working_directory` tool calls in the current turn take effect in the NEXT turn's context
- NEVER mutate the `ExecutionContext` during a turn — it is immutable after construction
- The `ExecutionContext` is not serialized or persisted — it is reconstructed each turn

---

## Autonomous Goal Execution (#4320)

`/goal create --auto [--turns N]` runs the agent autonomously for up to N turns without
user input until the goal condition is met or the turn limit is reached.

### Architecture: AutonomousDriver

`AutonomousDriver` is driven cooperatively from within the existing `Agent::run` `select!`
loop — no separate spawned task. The `--auto` flag is communicated via a
`pending_start_arc` side-channel from `handle_goal` back to the main loop, preserving
`&mut self` exclusivity.

```
Agent::run select! loop
    └── AutonomousDriver (cooperative select! branch)
            ├── turn_delay_ms sleep between turns
            ├── deferred supervisor retry on 429 (no blocking sleep)
            └── exit when: goal met | turn limit | stop signal
```

### GoalSupervisor

Performs an independent LLM call every `verify_interval` turns to confirm goal
achievement. Returns a JSON verdict `{ achieved: bool, reason: String }`.

- Isolated from the main LLM provider via `supervisor_provider` config field
- Deferred retry on 429 — never blocks the autonomous loop with a blocking sleep
- Pauses after `max_supervisor_fails` consecutive failures (supervisor circuit-breaker)

### AutonomousRegistry

`Arc<Mutex<HashMap<GoalId, AutonomousGoalState>>>` for fleet view and orphan detection.
Orphaned goals (from crashed sessions) are detected and marked on startup.

### GoalConfig Extensions

Seven new fields added to `GoalConfig` (#4320, #4355):

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `autonomous_enabled` | bool | false | Enable `--auto` flag on `/goal create` |
| `autonomous_max_turns` | u64 | 50 | Turn limit for autonomous runs |
| `supervisor_provider` | Option<ProviderName> | None | LLM provider for supervisor verification calls |
| `verify_interval` | u64 | 5 | Turns between supervisor verification calls |
| `supervisor_timeout_secs` | u64 | 30 | Per-call timeout for supervisor LLM calls |
| `max_stuck_count` | u64 | 3 | Consecutive identical-output turns before abort |
| `autonomous_turn_delay_ms` | u64 | 500 | Delay between autonomous turns |

### `/agents` Command Extension

`/agents` shows an **Autonomous Goals** fleet section listing active, completed, and
orphaned goals with their turn counts and supervisor verification status.

### Key Invariants

- `AutonomousDriver` MUST NOT be spawned as a separate task — must run inside `Agent::run`
  to preserve `&mut self` exclusivity
- Supervisor verification is fire-and-forget between turns — NEVER blocks a turn on supervisor LLM call
- `autonomous_enabled = false` (default) means `--auto` is rejected at parse time; no behavior change
- Orphan detection runs once at startup via `AutonomousRegistry::reconcile_orphans()`
- Turn delay between autonomous turns uses `tokio::time::sleep` — NEVER `std::thread::sleep`

---

## Memory Retrieval Failure Logging (#3597)

OmniMem self-improvement loop requires a dataset of memory retrieval failures.
Starting from PR #3597, `OmniMem::recall()` logs retrieval failures into the
`skill_outcomes` table (existing SQLite table used by self-learning) with
`outcome_type = "memory_miss"`.

### Logged Fields

| Field | Value |
|-------|-------|
| `outcome_type` | `"memory_miss"` |
| `query` | The original recall query string (truncated to 512 chars) |
| `strategy` | Recall strategy that was attempted (e.g., `"semantic"`, `"graph"`, `"hybrid"`) |
| `error` | Error message or "no_results" |
| `session_id` | Current session UUID |
| `ts` | Unix timestamp |

### What Counts as a Failure

- Qdrant query returns 0 results above the similarity threshold
- Qdrant query returns an error (network, timeout)
- Graph BFS returns 0 edges above the confidence threshold
- Hybrid recall produces 0 non-empty results after merging

### Key Invariants

- Failure logging is fire-and-forget — it MUST NOT block the recall return path
- Logged queries are truncated to 512 characters before storage — no unbounded writes
- Failure logs are NOT surfaced to the LLM or the user; they are operator/self-improvement data only
- `outcome_type = "memory_miss"` is a stable string — consumers (scheduler micro-benchmark) depend on it