zeph 0.21.2

Lightweight AI agent with hybrid inference, skills-first architecture, and multi-channel I/O
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
# Sub-Agent Orchestration

Sub-agents let you delegate tasks to specialized helpers that work in the background while you continue chatting with Zeph. Each sub-agent has its own system prompt, tools, and skills — but cannot access anything you haven't explicitly allowed.

## Quick Start

1. Create a definition file:

```markdown
---
name: code-reviewer
description: Reviews code for correctness and style
---

You are a code reviewer. Analyze the provided code for bugs, performance issues, and idiomatic style.
```

2. Save it to `.zeph/agents/code-reviewer.md` in your project (or `~/.config/zeph/agents/` for global use).

3. Spawn the sub-agent:

```
> /agent spawn code-reviewer Review the authentication module
Sub-agent 'code-reviewer' started (id: a1b2c3d4)
```

Or use the shorthand `@mention` syntax:

```
> @code-reviewer Review the authentication module
Sub-agent 'code-reviewer' started (id: a1b2c3d4)
```

That's it. The sub-agent works in the background and reports results when done.

## Managing Sub-Agents

| Command | Description |
|---------|-------------|
| `/agent list` | Show available sub-agent definitions |
| `/agent spawn <name> <prompt>` | Start a sub-agent with a task |
| `/agent bg <name> <prompt>` | Alias for `spawn` |
| `/agent status` | Show active sub-agents with state and progress |
| `/agent cancel <id>` | Cancel a running sub-agent (accepts ID prefix) |
| `/agent resume <id> <prompt>` | Resume a completed sub-agent with its conversation history |
| `/agent approve <id>` | Approve a pending secret request |
| `/agent deny <id>` | Deny a pending secret request |
| `@name <prompt>` | Shorthand for `/agent spawn` |

### Checking Status

```
> /agent status
Active sub-agents:
  [a1b2c3d4] working  turns=3  elapsed=42s  Analyzing auth flow...
```

### Cancelling

The `cancel` command accepts a UUID prefix. If the prefix is ambiguous (matches multiple agents), you'll be asked for a longer prefix:

```
> /agent cancel a1b2
Cancelled sub-agent a1b2c3d4-...
```

### Resuming

Resume a previously completed sub-agent session with `/agent resume`. The agent is re-spawned with its full conversation history loaded from the transcript, so it picks up where it left off:

```
> /agent resume a1b2 Fix the remaining two warnings
Resuming sub-agent a1b2c3d4-... (code-reviewer) with 12 messages
```

The `<id>` argument accepts a UUID prefix, just like `cancel`. The `<prompt>` is appended as a new user message after the restored history.

Resume requires transcript storage to be enabled (it is by default). If the transcript file for the given ID does not exist, the command returns an error.

### Transcript Storage

Every sub-agent session is recorded as a JSONL transcript file in `.zeph/subagents/` (configurable). Each line is a JSON object containing a sequence number, ISO 8601 timestamp, and the full message:

```
.zeph/subagents/
  a1b2c3d4-...-...-....jsonl        # conversation transcript
  a1b2c3d4-...-...-....meta.json    # sidecar metadata
```

The **meta sidecar** (`<agent_id>.meta.json`) stores structured metadata about the session:

```json
{
  "agent_id": "a1b2c3d4-...",
  "agent_name": "code-reviewer",
  "def_name": "code-reviewer",
  "status": "Completed",
  "started_at": "2026-03-05T10:00:00Z",
  "finished_at": "2026-03-05T10:01:38Z",
  "resumed_from": null,
  "turns_used": 5
}
```

When a session is resumed, the new meta sidecar records the original agent ID in `resumed_from`, creating a traceable chain.

Old transcript files are automatically cleaned up. When the file count exceeds `transcript_max_files`, the oldest transcripts (and their sidecars) are deleted on each spawn or resume.

#### Transcript Configuration

Configure transcript behavior in the `[agents]` section of `config.toml`:

```toml
[agents]
# Enable or disable transcript recording (default: true).
# When false, no transcript files are written and /agent resume is unavailable.
transcript_enabled = true

# Directory for transcript files (default: .zeph/subagents).
# transcript_dir = ".zeph/subagents"

# Maximum number of .jsonl files to keep (default: 50).
# Oldest files are deleted when the count exceeds this limit.
# Set to 0 for unlimited (no cleanup).
transcript_max_files = 50
```

## Writing Definitions

A definition is a markdown file with YAML frontmatter between `---` delimiters. The body after the closing `---` becomes the sub-agent's system prompt.

> **Note:** Prior to v0.13, definitions used TOML frontmatter (`+++`). That format is still accepted but deprecated and will be removed in v1.0.0. Migrate by replacing `+++` delimiters with `---` and converting the body to YAML syntax.

### Minimal Definition

Only `name` and `description` are required. Everything else has sensible defaults:

```markdown
---
name: helper
description: General-purpose helper
---

You are a helpful assistant. Complete the given task concisely.
```

### Full Definition

```markdown
---
name: code-reviewer
description: Reviews code changes for correctness and style
model: claude-sonnet-4-20250514
background: false
max_turns: 10
memory: project
tools:
  allow:
    - shell
    - web_scrape
  except:
    - shell_sudo
permissions:
  permission_mode: accept_edits
  secrets:
    - github-token
  timeout_secs: 300
  ttl_secs: 120
skills:
  include:
    - "git-*"
    - "rust-*"
  exclude:
    - "deploy-*"
hooks:
  PreToolUse:
    - matcher: "Bash"
      hooks:
        - type: command
          command: "./scripts/validate.sh"
  PostToolUse:
    - matcher: "Edit|Write"
      hooks:
        - type: command
          command: "./scripts/lint.sh"
---

You are a code reviewer. Analyze the provided code for:
- Correctness bugs
- Performance issues
- Idiomatic Rust style

Report findings as a structured list with severity (critical/warning/info).
```

### Field Reference

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `name` | string | required | Unique identifier |
| `description` | string | required | Human-readable description |
| `model` | string | inherited | LLM model override |
| `background` | bool | `false` | Run as a background task; secret requests are auto-denied inline |
| `max_turns` | u32 | `20` | Maximum LLM turns before the agent is stopped |
| `memory` | string || Persistent memory scope: `user`, `project`, or `local` (see [Persistent Memory]#persistent-memory) |
| `tools.allow` | string[] || Only these tools are available (mutually exclusive with `deny`) |
| `tools.deny` | string[] || All tools except these (mutually exclusive with `allow`) |
| `tools.except` | string[] | `[]` | Additional denylist applied on top of `allow`/`deny`; deny always wins over allow; exact match on tool ID |
| `permissions.permission_mode` | enum | `default` | Tool call approval policy (see below) |
| `permissions.secrets` | string[] | `[]` | Vault keys the agent MAY request |
| `permissions.timeout_secs` | u64 | `600` | Hard kill deadline |
| `permissions.ttl_secs` | u64 | `300` | TTL for granted permissions |
| `skills.include` | string[] | all | Glob patterns to include (`*` wildcard) |
| `skills.exclude` | string[] | `[]` | Glob patterns to exclude (takes precedence) |
| `hooks.PreToolUse` | HookMatcher[] | `[]` | Hooks fired before tool execution (see [Hooks]#hooks) |
| `hooks.PostToolUse` | HookMatcher[] | `[]` | Hooks fired after tool execution (see [Hooks]#hooks) |

If neither `tools.allow` nor `tools.deny` is specified, the sub-agent inherits all tools from the main agent.

### `permission_mode` Values

| Value | Description |
|-------|-------------|
| `default` | Standard interactive prompts — the user is asked before each sensitive tool call |
| `accept_edits` | File edit and write operations are auto-accepted without prompting |
| `dont_ask` | All tool calls are auto-approved without any prompt |
| `bypass_permissions` | Same as `dont_ask` but emits a warning at definition load time |
| `plan` | The agent can see the tool catalog but cannot execute any tools; produces text-only output |

> [!CAUTION]
> `bypass_permissions` skips all tool-call approval prompts. Only use it in fully trusted, sandboxed environments.

> [!TIP]
> Use `plan` mode when you only need a structured action plan from the agent and want to review it before any tools are executed.

### `tools.except` — Additional Denylist

`tools.except` lets you block specific tool IDs regardless of what `allow` or `deny` says. Deny always wins over allow, so a tool listed in both `allow` and `except` is blocked.

```yaml
tools:
  allow:
    - shell
    - web_scrape
  except:
    - shell_sudo    # blocked even though shell is in allow
```

Use `except` to tighten an existing allow list without rewriting it.

### `background` — Fire-and-Forget Execution

When `background: true`, the agent runs without blocking the conversation. Secret requests that would normally open an interactive prompt are auto-denied inline instead, so the main session is never paused waiting for user input.

```yaml
---
name: nightly-linter
description: Runs cargo clippy on the workspace nightly
background: true
max_turns: 5
tools:
  allow:
    - shell
---

Run `cargo clippy --workspace -- -D warnings` and report any new warnings introduced since the last run.
```

Results appear in `/agent status` and the TUI panel when the task completes.

### `max_turns` — Turn Limit

`max_turns` caps the number of LLM turns the agent may take. The agent is stopped automatically when the limit is reached, preventing runaway inference loops.

```yaml
---
name: summarizer
description: Summarizes long documents
max_turns: 3
---

Summarize the provided content in three bullet points.
```

The default is `20`. Set a lower value for narrow, well-defined tasks.

### Definition Locations

| Path | Scope | Priority |
|------|-------|----------|
| `.zeph/agents/` | Project | Higher (wins on name conflict) |
| `~/.config/zeph/agents/` | User (global) | Lower |

## Managing Definitions

Use the `zeph agents` subcommand to list, inspect, create, edit, and delete sub-agent definitions from the command line.

### List

```
$ zeph agents list
NAME             SCOPE                    DESCRIPTION                       MODEL
code-reviewer    project/code-reviewer…   Reviews code for correctness      claude-sonnet-4-20250514
test-writer      user/test-writer.md      Generates unit tests              -
```

### Show

```
$ zeph agents show code-reviewer
Name:        code-reviewer
Description: Reviews code for correctness
Source:      project/code-reviewer.md
Model:       claude-sonnet-4-20250514
Mode:        Default
Max turns:   10
Background:  false
Tools:       allow ["shell", "web_scrape"]

System prompt:
You are a code reviewer...
```

### Create

```
$ zeph agents create reviewer --description "Code review helper"
Created .zeph/agents/reviewer.md

$ zeph agents create reviewer --description "Code review helper" --model claude-sonnet-4-20250514
Created .zeph/agents/reviewer.md

$ zeph agents create reviewer --description "Global helper" --dir ~/.config/zeph/agents/
Created /Users/you/.config/zeph/agents/reviewer.md
```

Options:
- `--description` / `-d` — short description (required)
- `--model` — model override (optional)
- `--dir` — target directory (default: `.zeph/agents/`)

### Edit

Opens the definition file in `$VISUAL` or `$EDITOR` (falls back to `vi`). After the editor closes, Zeph re-parses the file to validate it:

```
$ zeph agents edit reviewer
# $EDITOR opens .zeph/agents/reviewer.md
Updated /path/to/.zeph/agents/reviewer.md
```

### Delete

```
$ zeph agents delete reviewer
Delete /path/to/.zeph/agents/reviewer.md? [y/N] y
Deleted reviewer
```

Use `--yes` / `-y` to skip the confirmation prompt.

### TUI Panel

The TUI command palette (`/`) includes `agents:*` entries. Select one to open the agent manager overlay or populate the input bar with the corresponding `/agent` command. Open the overlay directly by typing `/agents` in the command palette and selecting `agents:list`.

The agent manager overlay provides keyboard navigation over all loaded definitions:

| Key | Action |
|-----|--------|
| `j` / `k` or arrows | Navigate list |
| `Enter` | Open detail view |
| `c` | Create new definition (wizard form) |
| `e` (in detail view) | Edit via form |
| `d` (in detail view) | Delete with confirmation |
| `Esc` | Go back / close panel |

> **Note:** The TUI wizard edits `name`, `description`, `model`, and `max_turns` fields only. To edit `hooks`, `memory`, `skills`, or the system prompt, use `zeph agents edit` with `$EDITOR`.
>
> Saving via the TUI form rewrites the file and removes YAML comments. Use the CLI `edit` command to preserve hand-written formatting.

## Persistent Memory

Sub-agents can maintain persistent state across sessions via a `MEMORY.md` file and topic-specific files in a dedicated memory directory. This lets agents build knowledge over time without starting from scratch on every spawn.

### Enabling Memory

Add the `memory` field to a definition's YAML frontmatter:

```yaml
---
name: code-reviewer
description: Reviews code for correctness and style
memory: project
---
```

Or set a global default in `config.toml` (applies to all agents without an explicit `memory` field):

```toml
[agents]
default_memory_scope = "project"
```

### Memory Scopes

| Scope | Directory | Use Case |
|-------|-----------|----------|
| `user` | `~/.zeph/agent-memory/<name>/` | Cross-project memory shared between same-named agents. Do not store project-specific secrets here. |
| `project` | `.zeph/agent-memory/<name>/` | Project-scoped memory, suitable for version control. |
| `local` | `.zeph/agent-memory-local/<name>/` | Project-scoped but not committed. Add `.zeph/agent-memory-local/` to `.gitignore`. |

The memory directory is created automatically on first spawn. If the directory already exists, its contents are preserved.

### How It Works

1. **Directory creation** — At spawn time, Zeph creates the memory directory if it does not exist.
2. **MEMORY.md injection** — The first 200 lines of `MEMORY.md` are loaded and injected into the system prompt after the behavioral prompt, wrapped in `<agent-memory>` tags. Lines beyond 200 are truncated with a pointer to the full file.
3. **File tool access** — The agent uses Read, Write, and Edit tools to maintain `MEMORY.md` and create topic-specific files (e.g., `patterns.md`, `debugging.md`).
4. **Prompt ordering** — The behavioral system prompt (from the definition body) always takes precedence over memory content.

### Auto-Enabled File Tools

When an agent uses `tools.allow` (allowlist mode) and has memory enabled, Zeph automatically adds `Read`, `Write`, and `Edit` to the allowed tool list. A warning is logged so you know the tools were implicitly added:

```
WARN auto-enabled file tools for memory access — add ["Read", "Write", "Edit"]
     to tools.allow to suppress this warning
```

To silence the warning, explicitly include the file tools in your allowlist:

```yaml
tools:
  allow:
    - shell
    - Read
    - Write
    - Edit
```

If all three file tools are blocked (via `tools.except` or `tools.deny`), memory is silently disabled — the directory is not created and no content is injected.

### Sandbox and File Tool Access

Sub-agents run in a restricted sandbox that prevents file writes outside the agent's working directory. When an agent declares `memory: user`, Zeph automatically allows writes to the user-scoped memory directory (`~/.zeph/agent-memory/<name>/`) as an exception to the sandbox boundary.

This allows agents with `memory: user` to persist state across projects while remaining sandboxed from accidental writes to system directories or other project data. File paths are validated and canonicalized to prevent traversal attacks.

### Security

- **Agent name validation** — Names must match `^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$`. Path traversal attempts (e.g., `../etc/passwd`) are rejected.
- **Symlink boundary check**`MEMORY.md` is canonicalized before reading. If the resolved path escapes the memory directory (e.g., via a symlink), the file is silently skipped.
- **Size cap** — Files larger than 256 KiB are rejected.
- **Null byte guard** — Files containing null bytes are rejected.
- **Tag escaping**`<agent-memory>` tags in memory content are escaped to prevent prompt injection. Since `MEMORY.md` is agent-written (not user-written), this stricter escaping is applied by default.
- **Local scope .gitignore check** — When using `local` scope, Zeph warns if `.zeph/agent-memory-local/` is not in `.gitignore`.
- **Path canonicalization** — Memory directory paths are canonicalized to detect and block symlink-based escape attempts.

## Tool and Skill Access

### Tool Filtering

Control which tools a sub-agent can use:

- **Allow list** — only listed tools are available:
  ```yaml
  tools:
    allow:
      - shell
      - web_scrape
  ```
- **Deny list** — all tools except listed:
  ```yaml
  tools:
    deny:
      - shell
  ```
- **Except list** — additional block on top of allow or deny (deny always wins):
  ```yaml
  tools:
    allow:
      - shell
      - web_scrape
    except:
      - shell_sudo
  ```
- **Inherit all** — omit both `allow` and `deny`

Filtering is enforced at the executor level. The sub-agent's LLM only sees tool definitions it can actually call. Blocked tool calls return an error.

### Skill Filtering

Skills are filtered by glob patterns with `*` wildcard:

```yaml
skills:
  include:
    - "git-*"
    - "rust-*"
  exclude:
    - "deploy-*"
```

- Empty `include` = all skills pass (unless excluded)
- `exclude` always takes precedence over `include`

## Security Model

Sub-agents follow a zero-trust principle: they start with **zero permissions** and can only access what you explicitly grant.

### How It Works

1. **Definitions declare capabilities, not permissions.** Writing `secrets: [github-token]` means the agent _may request_ that secret — it doesn't get it automatically.

2. **Secrets require your approval.** When a sub-agent needs a secret, Zeph prompts you:

   > Sub-agent 'code-reviewer' requests 'github-token' (TTL: 120s). Allow? [y/n]

3. **Everything expires.** Granted permissions and secrets are automatically revoked after `ttl_secs` or when the sub-agent finishes — whichever comes first.

4. **Secrets stay in memory only.** They are never written to disk, message history, or logs.

### Permission Lifecycle

```mermaid
stateDiagram-v2
    [*] --> Request
    Request --> UserApproval
    UserApproval --> Denied
    UserApproval --> Grant: approved (with TTL)
    Grant --> Active
    Active --> Expired
    Active --> Revoked
    Expired --> [*]: cleared from memory
    Revoked --> [*]: cleared from memory
    Denied --> [*]
```

### Safety Guarantees

- Concurrency limit prevents resource exhaustion
- `permissions.timeout_secs` provides a hard kill deadline
- `max_turns` prevents runaway LLM loops
- Background agents auto-deny secret requests so the main session is never blocked
- All grants are revoked on completion, cancellation, or crash
- Secret key names are redacted in logs

## Hooks

Hooks let you run shell commands at specific points in a sub-agent's lifecycle. Use them to validate tool inputs, run linters after file edits, set up resources on agent start, or clean up on agent stop.

There are two hook scopes:

- **Per-agent hooks** — defined in the agent's YAML frontmatter, scoped to tool use events (`PreToolUse`, `PostToolUse`)
- **Config-level hooks** — defined in `config.toml`, scoped to agent lifecycle events (`SubagentStart`, `SubagentStop`)

### Per-Agent Hooks (PreToolUse / PostToolUse)

Add a `hooks` section to the agent's YAML frontmatter. Each event contains a list of matchers, and each matcher specifies which tools it applies to and what commands to run:

```yaml
---
name: code-reviewer
description: Reviews code for correctness and style
hooks:
  PreToolUse:
    - matcher: "Bash"
      hooks:
        - type: command
          command: "./scripts/validate.sh"
          timeout_secs: 10
          fail_closed: true
  PostToolUse:
    - matcher: "Edit|Write"
      hooks:
        - type: command
          command: "./scripts/lint.sh"
---
```

**`PreToolUse`** fires before a tool is executed. Set `fail_closed: true` to block execution if the hook exits non-zero.

**`PostToolUse`** fires after a tool finishes. Useful for linting, formatting, or auditing changes.

### Matcher Syntax

The `matcher` field is a pipe-separated list of tokens. A tool matches when its name contains any of the listed tokens (case-sensitive substring match):

| Matcher | Matches | Does not match |
|---------|---------|----------------|
| `"Bash"` | `Bash` | `Edit`, `Write` |
| `"Edit\|Write"` | `Edit`, `WriteFile` | `Bash`, `Read` |
| `"Shell"` | `Shell`, `ShellExec` | `Bash` |

### Hook Definition Fields

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `type` | string | required | Hook type — currently only `"command"` is supported |
| `command` | string | required | Shell command to execute (passed to `sh -c`) |
| `timeout_secs` | u64 | `30` | Maximum execution time before the hook is killed |
| `fail_closed` | bool | `false` | When `true`, a non-zero exit or timeout causes the calling operation to fail; when `false`, errors are logged and execution continues |

### Config-Level Hooks (SubagentStart / SubagentStop)

Define lifecycle hooks in `config.toml` under `[agents.hooks]`. These run for every sub-agent:

```toml
[agents.hooks]

[[agents.hooks.start]]
type = "command"
command = "echo agent started"
timeout_secs = 10

[[agents.hooks.stop]]
type = "command"
command = "./scripts/cleanup.sh"
```

**`start`** hooks fire after a sub-agent is spawned. **`stop`** hooks fire after a sub-agent finishes or is cancelled. Both are fire-and-forget — errors are logged but do not affect the agent's operation.

**Common use cases:**

| Hook | Use case |
|------|----------|
| `start` | Send a Slack/webhook notification that a sub-agent started; initialize a working directory; write a lock file |
| `stop` | Post results to a dashboard; remove temp files; log task duration |

Each hook definition accepts the same fields as per-agent hooks:

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `type` | string | required | Currently only `"command"` is supported |
| `command` | string | required | Shell command executed via `sh -c` |
| `timeout_secs` | u64 | `30` | Hook is killed after this many seconds |
| `fail_closed` | bool | `false` | When `true`, a non-zero exit blocks the operation; when `false`, errors are logged and execution continues |

Multiple hooks per event are supported — they run sequentially in definition order:

```toml
[agents.hooks]

[[agents.hooks.start]]
type = "command"
command = "curl -s -X POST https://hooks.example.com/agent-start -d agent=$ZEPH_AGENT_NAME"
timeout_secs = 5

[[agents.hooks.start]]
type = "command"
command = "mkdir -p /tmp/zeph-work/$ZEPH_AGENT_ID"
timeout_secs = 5

[[agents.hooks.stop]]
type = "command"
command = "rm -rf /tmp/zeph-work/$ZEPH_AGENT_ID"
timeout_secs = 10
```

### Environment Variables

Hook processes receive a clean environment with only the `PATH` variable preserved from the parent process. The following Zeph-specific variables are set:

| Variable | Description |
|----------|-------------|
| `ZEPH_AGENT_ID` | UUID of the sub-agent instance |
| `ZEPH_AGENT_NAME` | Name from the agent definition |
| `ZEPH_TOOL_NAME` | Tool name (only for `PreToolUse` / `PostToolUse`) |

### Security

Hooks follow a trust-boundary model:

- **Project-level definitions** (`.zeph/agents/`) may contain hooks — they are trusted because they live in the project repository.
- **User-level definitions** (`~/.config/zeph/agents/`) have all hooks stripped on load. This prevents untrusted global definitions from running arbitrary commands in any project.
- Hook processes run with a **cleared environment** (`env_clear()`). Only `PATH` is preserved from the parent to prevent accidental secret leakage.
- Child processes are **explicitly killed on timeout** to prevent orphan processes.

> **Note:** If you need hooks on a globally shared agent, move the definition into the project's `.zeph/agents/` directory instead.

## Global Agent Defaults

The `[agents]` section in `config.toml` sets defaults that apply to all sub-agents unless overridden by the individual definition:

```toml
[agents]
# Default permission mode for sub-agents that do not set one explicitly.
# "default" and omitting this field are equivalent — both result in standard
# interactive prompts.
# Valid values: "default", "accept_edits", "dont_ask"
# (bypass_permissions and plan are not useful as global defaults)
default_permission_mode = "default"

# Tool IDs blocked for all sub-agents, regardless of what their definition allows.
# Appended on top of any per-definition tool filtering.
default_disallowed_tools = []

# Must be true to allow any sub-agent definition to use bypass_permissions mode.
# When false (the default), spawning a definition with permission_mode: bypass_permissions
# is rejected at load time with an error.
allow_bypass_permissions = false

# Enable JSONL transcript recording for sub-agent sessions (default: true).
# When false, /agent resume is unavailable.
transcript_enabled = true

# Directory for transcript files (default: .zeph/subagents).
# transcript_dir = ".zeph/subagents"

# Maximum number of transcript files to keep (default: 50).
# Set to 0 for unlimited.
transcript_max_files = 50

# Default memory scope for agents that do not set `memory` in their frontmatter.
# Valid values: "user", "project", "local"
# Omit or set to null to disable memory by default.
# default_memory_scope = "project"

# Lifecycle hooks — run for every sub-agent start/stop.
# See the Hooks section above for the full schema.
# [agents.hooks]
# [[agents.hooks.start]]
# type = "command"
# command = "echo started"
# [[agents.hooks.stop]]
# type = "command"
# command = "./scripts/cleanup.sh"
```

> **Note:** `default_permission_mode = "default"` and omitting the field are equivalent — both leave per-agent prompting behavior unchanged.

> **Caution:** Set `allow_bypass_permissions = true` only in fully trusted, sandboxed environments. Without this flag, any definition requesting `bypass_permissions` mode is rejected at load time.

## Context Propagation

Sub-agents inherit context from the parent agent to reduce cold-start overhead:

- **Conversation history**: the parent's recent conversation history is forwarded to the sub-agent's initial context, giving it awareness of what has been discussed
- **Cancellation**: the parent's cancellation token is propagated so that cancelling the parent also cancels active sub-agents
- **Model inheritance**: sub-agents inherit the parent's active model unless overridden in the definition's `model` field

Sub-agents no longer exit after a single text-only LLM response — they continue the conversation loop until the task is complete or `max_turns` is reached.

### Sub-Agent Context Injection

`context_injection_mode` controls exactly how parent conversation history is injected into the sub-agent's task prompt. Configure it globally under `[agents]`:

```toml
[agents]
context_window_turns   = 10   # recent parent turns forwarded to the sub-agent
context_injection_mode = "last_assistant_turn"  # default
```

| Mode | Behavior |
|------|----------|
| `none` | No parent context injected. The sub-agent starts with only its system prompt and the task string. Use for fully isolated workers where parent history would be noise. |
| `last_assistant_turn` | The last assistant turn from the parent history is prepended to the task prompt as a preamble (default). Gives the sub-agent single-turn awareness — the most recent state — at zero extra LLM cost. |
| `summary` | A compact LLM-generated summary of the recent parent turns is injected. Suitable for long multi-turn sessions where full history injection would consume too many tokens. Requires a provider to generate the summary. |

`context_window_turns` limits how many parent turns are forwarded regardless of mode. Set to `0` to disable history propagation entirely (equivalent to `none` but affects all modes uniformly).

**Model inheritance**: sub-agents use the parent's active provider unless the definition's `model` field specifies an override. This means a sub-agent spawned during a `gpt-5.4` session will use `gpt-5.4` unless pinned to a different model in the definition.

### MCP Tool Awareness

Sub-agent system prompts are automatically annotated with the names of available MCP tools from connected servers. This helps the sub-agent's LLM understand what external capabilities are available without injecting full tool schemas.

## Interactive TUI Sidebar

When the `tui` feature is enabled, pressing `Tab` in Normal mode cycles to the sub-agent sidebar. The sidebar provides:

- Live status for all active sub-agents with color-coded indicators
- A transcript viewer that shows the full conversation history of a selected sub-agent
- Keyboard navigation: `j`/`k` to select agents, `Enter` to open the transcript, `Esc` to close

## TUI Dashboard Panel

When the `tui` feature is enabled, a Sub-Agents panel appears in the sidebar showing active agents with color-coded status:

```
┌ Sub-Agents (2) ─────────────────────────┐
│  code-reviewer [plan]  WORKING  3/20  42s │
│  test-writer [bg] [bypass!]  COMPLETED 10/20  100s │
└─────────────────────────────────────────┘
```

Colors: yellow = working, green = completed, red = failed, cyan = input required.

Permission mode badges: `[plan]`, `[accept_edits]`, `[dont_ask]`, `[bypass!]`. The `default` mode shows no badge.

## Architecture

Sub-agents run as in-process tokio tasks — not separate processes. The main agent communicates with them via lightweight primitives:

```mermaid
sequenceDiagram
    participant M as SubAgentManager
    participant S as Sub-Agent (tokio task)
    M->>S: tokio::spawn(run_agent_loop)
    S-->>M: watch::send(Working)
    S-->>M: watch::send(Working, msg)
    M->>S: CancellationToken::cancel()
    S-->>M: watch::send(Completed)
    S-->>M: JoinHandle.await → Result
```

| Primitive | Direction | Purpose |
|-----------|-----------|---------|
| `watch::channel` | Agent → Manager | Real-time status updates |
| `JoinHandle` | Agent → Manager | Final result collection |
| `CancellationToken` | Manager → Agent | Graceful cancellation |

### `@mention` vs File References

The TUI uses `@` for both sub-agent mentions and file references. Zeph resolves ambiguity by checking the token after `@` against known agent names:

```
@code-reviewer review src/main.rs   → sub-agent mention
@src/main.rs                        → file reference
```

## API Reference

For programmatic use, `SubAgentManager` provides the full lifecycle API:

```rust
let mut manager = SubAgentManager::new(/* max_concurrent */ 4);

manager.load_definitions(&[
    project_dir.join(".zeph/agents"),
    dirs::config_dir().unwrap().join("zeph/agents"),
])?;

let task_id = manager.spawn("code-reviewer", "Review src/main.rs", provider, executor, None)?;
let statuses = manager.statuses();
manager.cancel(&task_id)?;
let result = manager.collect(&task_id).await?;
```

| Method | Description |
|--------|-------------|
| `load_definitions(&[PathBuf])` | Load `.md` definitions (first-wins deduplication) |
| `spawn(name, prompt, provider, executor, skills)` | Spawn a sub-agent, returns task ID |
| `cancel(task_id)` | Cancel and revoke all grants |
| `collect(task_id)` | Await result and remove from active set |
| `statuses()` | Snapshot of all active sub-agent states |
| `approve_secret(task_id, key, ttl)` | Grant a vault secret after user approval |
| `shutdown_all()` | Cancel all active sub-agents (used on exit) |

### Error Types

| Variant | When |
|---------|------|
| `Parse` | Invalid frontmatter or YAML/TOML |
| `Invalid` | Validation failure (empty name, mutual exclusion) |
| `NotFound` | Unknown definition name or task ID |
| `Spawn` | Concurrency limit reached or task panic |
| `Cancelled` | Sub-agent was cancelled |

## Background Lifecycle (Phase 5 — Planned)

> **Planned** — The features in this section are part of Phase 5 (#1145) and not yet available.

Phase 5 closes the gap between fire-and-forget background agents and a full lifecycle model with timeout enforcement, result persistence, completion notifications, and new CLI commands for inspecting agent output.

### Timeout Enforcement

> **Planned** — This feature is part of Phase 5 (#1145) and not yet available.

The `permissions.timeout_secs` field is currently parsed from agent definitions but **not enforced at runtime**. A runaway background agent can consume resources indefinitely.

Phase 5 wraps the agent loop in `tokio::time::timeout` so agents are killed when the deadline expires:

```rust
let timeout_dur = Duration::from_secs(def.permissions.timeout_secs);
let join_handle = tokio::spawn(async move {
    match tokio::time::timeout(timeout_dur, run_agent_loop(args)).await {
        Ok(result) => result,
        Err(_elapsed) => {
            tracing::warn!("sub-agent timed out after {timeout_dur:?}");
            Err(anyhow::anyhow!("sub-agent timed out after {}s", timeout_dur.as_secs()))
        }
    }
});
```

The default timeout is **600 seconds** (10 minutes). Override it per agent:

```yaml
---
name: long-running-task
description: Agent with a custom timeout
permissions:
  timeout_secs: 1800  # 30 minutes
---
```

Timeout is wall-clock time, independent of `max_turns`. Both limits are enforced simultaneously — whichever fires first stops the agent.

### Completion Notifications

> **Planned** — This feature is part of Phase 5 (#1145) and not yet available.

Currently the parent agent must poll `/agent status` to discover when a background agent finishes. Phase 5 introduces a `CompletionEvent` that fires when any agent reaches a terminal state (completed, failed, cancelled, or timed out):

```rust
pub struct CompletionEvent {
    pub task_id: String,
    pub agent_name: String,
    pub state: SubAgentState,
    pub elapsed: Duration,
}
```

The event carries only metadata — no result summary. Consumers read the full output from the persisted output file or SQLite table.

Delivery uses a **cooperative sweep-on-access** model rather than a background task. The manager's `reap_completed()` method is called from the agent loop, collects all finished handles, persists results, and returns completion events. This avoids shared-ownership complexity since `SubAgentManager` is not behind `Arc<Mutex>`.

### Result Persistence

> **Planned** — This feature is part of Phase 5 (#1145) and not yet available.

Background agent results are currently ephemeral — stored as in-memory strings, lost if not explicitly collected or on process exit. Phase 5 adds dual persistence:

**Output files** — The final result is written to `.zeph/agent-output/<task_id>.txt` with a 1 MiB cap and 24-hour retention. Files are cleaned up by the reaper on the next sweep.

**SQLite table** — A `background_results` table stores structured metadata:

```sql
CREATE TABLE IF NOT EXISTS background_results (
    task_id     TEXT PRIMARY KEY,
    agent_name  TEXT NOT NULL,
    success     INTEGER NOT NULL,
    result_text TEXT NOT NULL,
    turns_used  INTEGER NOT NULL,
    elapsed_ms  INTEGER NOT NULL,
    created_at  TEXT NOT NULL DEFAULT (datetime('now'))
);
```

Configure persistence in `config.toml`:

```toml
[agents]
output_dir = ".zeph/agent-output"       # default
output_retention_secs = 86400           # 24h, default
output_max_bytes = 1048576              # 1 MiB, default
```

### New CLI Commands

> **Planned** — This feature is part of Phase 5 (#1145) and not yet available.

| Command | Description |
|---------|-------------|
| `/agent output <id>` | Print the persisted output file for a completed agent |
| `/agent collect <id>` | Collect a specific agent's result |
| `/agent collect` | Collect all completed agents at once |

`/agent collect` without arguments collects all agents in a terminal state (completed, failed, timed out). Active agents are skipped — the command never blocks waiting for a running agent to finish. `/agent collect <id>` collects a specific agent by ID prefix.

Example workflow:

```
> /agent bg code-reviewer Review the auth module
Sub-agent 'code-reviewer' started (id: a1b2c3d4)

> /agent status
Active sub-agents:
  [a1b2c3d4] completed  turns=5  elapsed=38s

> /agent output a1b2
--- Output for a1b2c3d4 (code-reviewer) ---
Found 2 issues in the auth module:
1. [critical] Token expiry check missing in refresh_token()
2. [warning] Redundant clone on line 42
---

> /agent collect
Collected 1 completed agent(s).
```

### Structured Result Type

> **Planned** — This feature is part of Phase 5 (#1145) and not yet available.

The current `run_agent_loop` returns a raw `String`. Phase 5 replaces it with a structured `AgentResult`:

```rust
pub struct AgentResult {
    pub final_response: String,
    pub conversation: Vec<Message>,  // full message history
    pub turns_used: u32,
    pub elapsed: Duration,
    pub timed_out: bool,
}
```

This enables `/agent output` to show the full result, and `collect()` to return structured data for programmatic use. The `JoinHandle` type changes from `Result<String>` to `Result<AgentResult>`.

### Progress Streaming

> **Planned** — This feature is part of Phase 5 (#1145) and not yet available.

The `last_message` field in `SubAgentStatus` is currently truncated to 120 characters, providing minimal visibility into agent progress. Phase 5 makes two improvements:

1. **Increased truncation limit**`last_message` truncation increases from 120 to 500 characters for immediate benefit without breaking changes.

2. **Dedicated progress channel** — A separate `mpsc::Sender<ProgressUpdate>` channel carries full per-turn output alongside the existing `watch` channel:

```rust
pub struct ProgressUpdate {
    pub turn: u32,
    pub content: String,            // full LLM response for this turn
    pub tool_output: Option<String>, // tool result if applicable
}
```

The `watch` channel remains for lightweight status polling (no breaking change to `SubAgentStatus`). The progress channel has a capacity of 32 messages — unread messages are dropped when the buffer is full to prevent OOM.

Access progress updates via `SubAgentManager::drain_progress(task_id) -> Vec<ProgressUpdate>`.

### Hook Improvements

> **Planned** — This feature is part of Phase 5 (#1145) and not yet available.

Phase 5 adds a new environment variable to `SubagentStop` hooks:

| Variable | Description |
|----------|-------------|
| `ZEPH_AGENT_EXIT_REASON` | Exit reason: `completed`, `failed`, `canceled`, or `timed_out` |

This allows stop hooks to take different actions based on how the agent ended — for example, sending a notification only on failure or cleaning up resources only on timeout.

Phase 5 also fixes a bug where `SubagentStop` hooks fire twice when a running agent is cancelled and then collected. The fix ensures the hook fires exactly once at the first terminal state transition.