zeph 0.20.2

Lightweight AI agent with hybrid inference, skills-first architecture, and multi-channel I/O
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
# Security

Zeph implements defense-in-depth security for safe AI agent operations in production environments.

## Age Vault

Zeph can store secrets in an [age](https://age-encryption.org/)-encrypted vault file instead of environment variables. This is the recommended approach for production and shared environments.

### Setup

```bash
zeph vault init                        # generate keypair + empty vault
zeph vault set ZEPH_CLAUDE_API_KEY sk-ant-...
zeph vault set ZEPH_TELEGRAM_TOKEN 123456:ABC...
zeph vault list                        # show stored keys
zeph vault get ZEPH_CLAUDE_API_KEY     # retrieve a value
zeph vault rm ZEPH_CLAUDE_API_KEY      # remove a key
```

Enable the vault backend in config:

```toml
[vault]
backend = "age"
```

The vault file path defaults to `~/.zeph/vault.age`. The private key path defaults to `~/.zeph/key.txt`.

### Custom Secrets

Beyond built-in provider keys, you can store arbitrary secrets for skill authentication using the `ZEPH_SECRET_` prefix:

```bash
zeph vault set ZEPH_SECRET_GITHUB_TOKEN ghp_yourtokenhere
zeph vault set ZEPH_SECRET_STRIPE_KEY sk_live_...
```

Skills declare which secrets they require via `x-requires-secrets` in their frontmatter. Skills with unsatisfied secrets are excluded from the prompt automatically — they will not be matched or executed until the secret is available.

When a skill with `x-requires-secrets` is active, its secrets are injected as environment variables into shell commands it runs. The prefix is stripped and the name is uppercased:

| Vault key | Env var injected |
|-----------|-----------------|
| `ZEPH_SECRET_GITHUB_TOKEN` | `GITHUB_TOKEN` |
| `ZEPH_SECRET_STRIPE_KEY` | `STRIPE_KEY` |

Only the secrets declared by the currently active skill are injected — not all vault secrets.

See [Add Custom Skills — Secret-Gated Skills](../guides/custom-skills.md#secret-gated-skills) for how to declare requirements in a skill.

### Docker

Mount the vault and key files as read-only volumes:

```yaml
volumes:
  - ~/.zeph/vault.age:/home/zeph/.zeph/vault.age:ro
  - ~/.zeph/key.txt:/home/zeph/.zeph/key.txt:ro
```

### File Permissions

All sensitive files created by Zeph are now protected with mode `0600` (owner read/write only), independent of the process umask. This ensures your secrets are never accidentally readable by other users on the system.

Protected files include:
- Vault files (`~/.zeph/vault.age`, `~/.zeph/key.txt`)
- SQLite databases (conversation history, embeddings, metrics)
- Debug dumps (when enabled)
- Audit logs (tool execution records, JSONL format)
- Configuration files (`config.toml`, router state, ACP permissions)
- MCP server list (`mcpls.toml`)

**Checking permissions manually:**

```bash
ls -la ~/.zeph/vault.age   # Should show: -rw------- (mode 0600)
ls -la ~/.zeph/key.txt     # Should show: -rw------- (mode 0600)
```

Run `zeph doctor` to verify file modes are correct across all sensitive Zeph files.

## Plugin Manifest Integrity

Zeph records a sha256 digest of each installed plugin's `.plugin.toml` manifest and verifies it at startup and during hot-reload. The integrity registry is stored in `~/.local/share/zeph/.plugin-integrity.toml` (outside the plugins directory to prevent TOCTOU races).

**Protection scope:**
- Detects if a plugin manifest has been modified outside of Zeph's control (e.g., accidentally edited, maliciously replaced)
- Missing entries from pre-feature installs are permitted with a debug-level log
- Mismatches cause the plugin to be skipped with an "integrity mismatch" reason visible in `zeph plugin list --overlay`

**To re-protect after a valid change:**
```bash
zeph plugin remove <name>
zeph plugin add /path/to/<name>
```

This stores a fresh digest, allowing the plugin to load normally.

**Known limits:**
- Not cryptographically signed — prevents accidental corruption but not determined adversaries
- Concurrent installs may race (last writer wins on the `.plugin-integrity.toml` file)

## Shell Command Filtering

All shell commands from LLM responses pass through a security filter before execution. Shell command detection uses a tokenizer-based pipeline that splits input into tokens, handles wrapper commands (e.g., `env`, `nohup`, `timeout`), and applies word-boundary matching against blocked patterns. This replaces the prior substring-based approach for more accurate detection with fewer false positives. Commands matching blocked patterns are rejected with detailed error messages.

**12 blocked patterns by default:**

| Pattern | Risk Category | Examples |
|---------|---------------|----------|
| `rm -rf /`, `rm -rf /*` | Filesystem destruction | Prevents accidental system wipe |
| `sudo`, `su` | Privilege escalation | Blocks unauthorized root access |
| `mkfs`, `fdisk` | Filesystem operations | Prevents disk formatting |
| `dd if=`, `dd of=` | Low-level disk I/O | Blocks dangerous write operations |
| `curl \| bash`, `wget \| sh` | Arbitrary code execution | Prevents remote code injection |
| `nc`, `ncat`, `netcat` | Network backdoors | Blocks reverse shell attempts |
| `shutdown`, `reboot`, `halt` | System control | Prevents service disruption |

**Configuration:**
```toml
[tools.shell]
timeout = 30
blocked_commands = ["custom_pattern"]  # Additional patterns (additive to defaults)
allowed_paths = ["/home/user/workspace"]  # Restrict filesystem access
allow_network = true  # false blocks curl/wget/nc
confirm_patterns = ["rm ", "git push -f"]  # Destructive command patterns
```

Custom blocked patterns are **additive** — you cannot weaken default security. Matching is case-insensitive.

### Subshell Detection

The blocklist scanner detects blocked commands wrapped inside subshell constructs. The tokenizer extracts the command token from backtick substitution (`` `cmd` ``), `$(cmd)`, `<(cmd)`, and `>(cmd)` process substitution forms. A blocked command name within any of these constructs is rejected before the shell sees it.

For example, `` `sudo rm -rf /` ``, `$(sudo rm -rf /)`, `<(sudo cat /etc/shadow)`, and `>(nc evil.example.com)` are all blocked when `sudo`, `rm -rf /`, or `nc` appear in the blocklist.

### Known Limitations

`find_blocked_command` operates on tokenized command text and cannot detect blocked commands embedded inside indirect execution constructs:

| Construct | Example | Why it bypasses |
|-----------|---------|-----------------|
| Here-strings | `bash <<< 'sudo rm -rf /'` | The payload string is opaque to the filter |
| `eval` / `bash -c` / `sh -c` | `eval 'sudo rm -rf /'` | String argument is not parsed |
| Variable expansion | `cmd=sudo; $cmd rm -rf /` | Variables are not resolved during tokenization |

**Mitigation:** The default `confirm_patterns` in `ShellConfig` include `<(`, `>(`, `<<<`, `eval `, `$(`, and `` ` `` — commands containing these constructs trigger a confirmation prompt before execution. For high-security deployments, complement this filter with OS-level sandboxing (Linux namespaces, seccomp, or similar).

## Shell Sandbox

Commands are validated against a configurable filesystem allowlist before execution:

- `allowed_paths = []` (default) restricts access to the working directory only
- Paths are canonicalized to prevent traversal attacks (`../../etc/passwd`)
- Relative paths containing `..` segments are rejected before canonicalization as an additional defense layer
- `allow_network = false` blocks network tools (`curl`, `wget`, `nc`, `ncat`, `netcat`)

## Destructive Command Confirmation

Commands matching `confirm_patterns` trigger an interactive confirmation before execution:

- **CLI:** `y/N` prompt on stdin
- **Telegram:** inline keyboard with Confirm/Cancel buttons
- Default patterns: `rm`, `git push -f`, `git push --force`, `drop table`, `drop database`, `truncate`, `$(`, `` ` ``, `<(`, `>(`, `<<<`, `eval`
- Configurable via `tools.shell.confirm_patterns` in TOML

## File Executor Sandbox

`FileExecutor` enforces the same `allowed_paths` sandbox as the shell executor for all file operations (`read`, `write`, `edit`, `glob`, `grep`).

**Path validation:**
- All paths are resolved to absolute form and canonicalized before access
- Absolute paths are rejected when the operation is not explicitly authorized (e.g., the `/image` slash command rejects absolute paths like `/etc/passwd` and only permits relative paths)
- Non-existing paths (e.g., for `write`) use ancestor-walk canonicalization: the resolver walks up the path tree to the nearest existing ancestor, canonicalizes it, then re-appends the remaining segments. This prevents symlink and `..` traversal on paths that do not yet exist on disk
- If the resolved path does not fall under any entry in `allowed_paths`, the operation is rejected with a `SandboxViolation` error

**Glob and grep enforcement:**
- `glob` results are post-filtered: matched paths outside the sandbox are silently excluded
- `grep` validates the search root directory before scanning begins

**Configuration** is shared with the shell sandbox:
```toml
[tools.shell]
allowed_paths = ["/home/user/workspace"]  # Empty = cwd only
```

## File Read Sandbox

The `[tools.file]` section exposes per-path glob filters that are applied independently of the `allowed_paths` filesystem sandbox. They operate on the canonicalized absolute path, making them symlink-safe.

**Evaluation order: deny first, then allow.**

| Field | Purpose |
|-------|---------|
| `deny_read` | Glob patterns that are always blocked. Evaluated before `allow_read`. |
| `allow_read` | Glob patterns that are permitted even when a `deny_read` rule would match. Empty list means "allow all paths that are not denied." |

If a path matches `deny_read` and does **not** match `allow_read`, the read is rejected with a `SandboxViolation` error. If `deny_read` is empty, no paths are blocked (the allow list has no effect).

**Example — block secrets, allow a single public file:**

```toml
[tools.file]
deny_read  = ["**/.env", "**/secrets/**", "**/*.key"]
allow_read = ["/home/user/projects/**"]
```

In this configuration, any `.env` file under any directory is denied. Paths under `/home/user/projects/` are permitted even if they would otherwise match a deny pattern.

Paths are canonicalized before matching, so symlinks that resolve outside the allow list or into a denied path are correctly blocked.

## MCP Tool Name Collision

Each MCP tool is identified internally by a `sanitized_id` derived from its `qualified_name` (`server_id:tool_name`). The colon and any characters outside `[a-zA-Z0-9_-]` are replaced with `_`. This means two different `(server_id, tool_name)` pairs can produce the same `sanitized_id` — for example, `a.b:c` and `a:b_c` both sanitize to `a_b_c`.

**Detection:** Zeph runs `detect_collisions` against the full tool list whenever servers are loaded or a new server is added. Every collision pair is reported at `WARN` level:

```
WARN zeph_mcp: MCP tool sanitized_id collision: 'a_b_c' shadows 'a:b_c' — executor will always dispatch to the first-registered tool
```

**Resolution:** The first-registered tool wins dispatch. Subsequent tools with the same `sanitized_id` are unreachable — the executor cannot route calls to them.

**Security implication:** A malicious or misconfigured MCP server could register a tool whose `sanitized_id` collides with a trusted server's tool, causing the trusted tool to become unreachable. Zeph does not silently allow this: the collision is logged with both the `qualified_name` and trust level of each conflicting tool so the operator can identify and remove the offending server.

**Mitigation:** Choose server IDs that are unique and do not produce overlapping sanitized names. If two legitimate servers expose tools with colliding names, rename one server's ID in the Zeph config:

```toml
[[mcp.servers]]
id = "github-primary"   # unique prefix prevents sanitized_id collision
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
```

## Autonomy Levels

The `security.autonomy_level` setting controls the agent's tool access scope:

| Level | Tools Available | Confirmations |
|-------|----------------|---------------|
| `readonly` | `read`, `find_path`, `list_directory`, `grep`, `web_scrape`, `fetch` | N/A (write tools hidden) |
| `supervised` | All tools per permission policy | Yes, for destructive patterns |
| `full` | All tools | No confirmations |

Default is `supervised`. In `readonly` mode, write-capable tools are excluded from the LLM system prompt and rejected at execution time (defense-in-depth).

```toml
[security]
autonomy_level = "supervised"  # readonly, supervised, full
```

## Permission Policy

The `[tools.permissions]` config section provides fine-grained, pattern-based access control for each tool. Rules are evaluated in order (first match wins) using case-insensitive glob patterns against the tool input. See [Tool System — Permissions](../advanced/tools.md#permissions) for configuration details.

Key security properties:
- Tools with all-deny rules are excluded from the LLM system prompt, preventing the model from attempting to use them
- Legacy `blocked_commands` and `confirm_patterns` are auto-migrated to equivalent permission rules when `[tools.permissions]` is absent
- Default action when no rule matches is `Ask` (confirmation required)

## Audit Logging

Structured JSON audit log for all tool executions:

```toml
[tools.audit]
enabled = true
destination = ".zeph/data/audit.jsonl"  # or "stdout"
```

Each entry includes timestamp, tool name, command, result (success/blocked/error/timeout), and duration in milliseconds.

## Secret Redaction

LLM responses are scanned for secret patterns using compiled regexes before display:

- Detected prefixes: `sk-`, `AKIA`, `ghp_`, `gho_`, `xoxb-`, `xoxp-`, `sk_live_`, `sk_test_`, `-----BEGIN`, `AIza` (Google API), `glpat-` (GitLab), `hf_` (HuggingFace), `npm_` (npm), `dckr_pat_` (Docker)
- Regex-based matching replaces detected secrets with `[REDACTED]`, preserving original whitespace formatting
- Enabled by default (`security.redact_secrets = true`), applied to both streaming and non-streaming responses

## Credential Scrubbing in Context

In addition to output redaction, Zeph scrubs credential patterns from conversation history **before** injecting it into the LLM context window. The `scrub_content()` function in the context builder detects the same secret prefixes and replaces them with `[REDACTED]`. This prevents credentials that appeared in past messages from leaking into future LLM prompts.

```toml
[memory]
redact_credentials = true  # default: true
```

This is independent of `security.redact_secrets` — output redaction sanitizes LLM *responses*, while credential scrubbing sanitizes LLM *inputs* from stored history.

## Config Validation

`Config::validate()` enforces upper bounds at startup to catch configuration errors early:

- `memory.history_limit` <= 10,000
- `memory.context_budget_tokens` <= 1,000,000 (when non-zero)
- `agent.max_tool_iterations` <= 100
- `a2a.rate_limit` > 0
- `gateway.rate_limit` > 0
- `gateway.max_body_size` <= 10,485,760 (10 MiB)

The agent exits with an error message if any bound is violated.

## Timeout Policies

Configurable per-operation timeouts prevent hung connections:

```toml
[timeouts]
llm_seconds = 120       # LLM chat completion
embedding_seconds = 30  # Embedding generation
a2a_seconds = 30        # A2A remote calls
```

## A2A and Gateway Bearer Authentication

Both the A2A server and the HTTP gateway use bearer token authentication backed by constant-time comparison (`subtle::ConstantTimeEq`) to prevent timing side-channel attacks.

### A2A Server

Configure via `config.toml` or environment variable:

```toml
[a2a]
auth_token = "secret"  # or use vault: ZEPH_A2A_AUTH_TOKEN
```

The `/.well-known/agent.json` endpoint is intentionally public and bypasses auth to allow agent discovery.

If `auth_token` is `None` at startup, the server logs a `WARN`-level message:

```
WARN zeph_a2a: A2A server started without auth_token — endpoint is unauthenticated
```

### HTTP Gateway

Configure via `config.toml` or environment variable:

```toml
[gateway]
auth_token = "secret"  # or use vault: ZEPH_GATEWAY_TOKEN
```

The ACP HTTP `GET /health` endpoint is intentionally public and bypasses auth so IDEs can poll server readiness before authenticating or opening a session.

If `auth_token` is `None` at startup, the server logs a `WARN`-level message:

```
WARN zeph_gateway: Gateway started without auth_token — endpoint is unauthenticated
```

**Recommendation:** Always set `auth_token` when binding to a non-loopback interface. Use the [Age Vault](security.md#age-vault) to store the token rather than embedding it in plain text in `config.toml`.

## SSRF Protection for Web Scraping

`WebScrapeExecutor` defends against Server-Side Request Forgery (SSRF) at every stage of a request, including multi-hop redirect chains.

### URL Validation

Before any network connection is made, `validate_url` checks:

- **HTTPS only:** HTTP, `file://`, `javascript:`, `data:`, and all other schemes are rejected with `ToolError::Blocked`.
- **Private hostnames:** The following hostname patterns are blocked regardless of DNS resolution:
  - `localhost` and `*.localhost` subdomains
  - `*.internal` TLD (cloud/Kubernetes internal DNS)
  - `*.local` TLD (mDNS/Bonjour)
  - IPv4 literals in RFC 1918 ranges (`10.x.x.x`, `172.16–31.x.x`, `192.168.x.x`)
  - IPv4 link-local (`169.254.x.x`), loopback (`127.x.x.x`), unspecified (`0.0.0.0`), and broadcast (`255.255.255.255`)
  - IPv6 loopback (`::1`), link-local (`fe80::/10`), unique-local (`fc00::/7`), and unspecified (`::`)
  - IPv4-mapped IPv6 addresses (`::ffff:x.x.x.x`) — the inner IPv4 is checked against all private ranges above

### DNS Rebinding Prevention

After URL validation, `resolve_and_validate` performs a DNS lookup and checks every returned IP address against the same private-range rules. The validated socket addresses are then pinned to the `reqwest` client via `resolve_to_addrs`, eliminating the TOCTOU window between DNS validation and the actual TCP connection.

If DNS resolves to a private IP, the request is rejected with:

```
ToolError::Blocked { command: "SSRF protection: private IP <ip> for host <host>" }
```

### Redirect Chain Defense

`WebScrapeExecutor` disables `reqwest`'s automatic redirect following (`redirect::Policy::none()`). Redirects are followed manually, up to a limit of **3 hops**. For every redirect:

1. The `Location` header value is extracted.
2. Relative URLs are resolved against the current request URL.
3. `validate_url` runs on the resolved target — blocking private hostnames and non-HTTPS schemes.
4. `resolve_and_validate` runs on the target — blocking DNS-based rebinding.
5. A new `reqwest` client is built, pinned to the validated addresses for the next hop.

This prevents the classic "open redirect to internal service" SSRF bypass: even if the initial URL passes validation, a redirect to `https://169.254.169.254/` (AWS metadata endpoint) or `https://10.0.0.1/` is blocked before the connection is made.

If more than 3 redirects occur, the request fails with `ToolError::Execution("too many redirects")`.

## A2A Network Security

- **TLS enforcement:** `a2a.require_tls = true` rejects HTTP endpoints (HTTPS only)
- **SSRF protection:** `a2a.ssrf_protection = true` blocks private IP ranges (RFC 1918, loopback, link-local) via DNS resolution
- **Payload limits:** `a2a.max_body_size` caps request body (default: 1 MiB)

**Safe execution model:**
- Commands parsed for blocked patterns, then sandbox-validated, then confirmation-checked
- Timeout enforcement (default: 30s, configurable)
- Full errors logged to system; user-facing messages pass through `sanitize_paths()` which replaces absolute filesystem paths (`/home/`, `/Users/`, `/root/`, `/tmp/`, `/var/`) with `[PATH]` to prevent information disclosure
- Audit trail for all tool executions (when enabled)

## Container Security

| Security Layer | Implementation | Status |
|----------------|----------------|--------|
| **Base image** | Oracle Linux 9 Slim | Production-hardened |
| **Vulnerability scanning** | Trivy in CI/CD | **0 HIGH/CRITICAL CVEs** |
| **User privileges** | Non-root `zeph` user (UID 1000) | Enforced |
| **Attack surface** | Minimal package installation | Distroless-style |

**Continuous security:**
- Every release scanned with [Trivy](https://trivy.dev/) before publishing
- Automated Dependabot PRs for dependency updates
- `cargo-deny` checks in CI for license/vulnerability compliance

## Secret Memory Hygiene

Zeph uses the [`zeroize`](https://crates.io/crates/zeroize) crate to ensure that secret material is erased from process memory as soon as it is no longer needed.

**`Secret` type:**

```rust
// Internal representation — wraps Zeroizing<String> instead of plain String
Secret(Zeroizing<String>)
```

`Zeroizing<T>` implements `Drop` to overwrite heap memory with zeros before deallocation, preventing secrets from lingering in freed pages.

**`AgeVaultProvider`:**

All decrypted values in the in-memory secrets map are stored as `BTreeMap<String, Zeroizing<String>>`. Using `BTreeMap` instead of `HashMap` ensures that secrets are serialized in deterministic key order when `vault.save()` re-encrypts the vault. This makes repeated save operations produce consistent JSON output, which is important for diffing and auditing encrypted vault changes. Key-file content and intermediate decrypt buffers are also wrapped in `Zeroizing` so they are cleared when the local binding is dropped.

**`Clone` intentionally removed:**

`Secret` no longer derives `Clone`. This is a deliberate trade-off: preventing accidental cloning reduces the number of live copies of a secret value in memory at any given time.

If you need to pass a secret to a function, accept `&Secret` or extract the inner `&str` directly rather than cloning.

## VIGIL Intent-Anchoring Gate

VIGIL is a pre-sanitizer tripwire that scans tool outputs for prompt injection patterns before they reach the LLM context. It operates independently of the DeBERTa/AlignSentinel/TurnCausalAnalyzer stack and uses regex-based pattern matching for low-latency detection.

### Configuration

```toml
[security.vigil]
enabled = true                          # Master switch (default: true)
strict_mode = false                     # Deny on any pattern match; false = log + sanitize (default: false)
exempt_tools = ["read_file", "shell"]   # Tools exempt from VIGIL checks (default: ["load_skill", "invoke_skill"])
extra_patterns = []                     # Additional regex patterns to detect (must compile without ReDoS risk)
```

### Behavior

- **Block mode** (strict_mode = true): Replace flagged content with a sentinel value and log the event
- **Sanitize mode** (strict_mode = false, default): Truncate flagged content at the injection point and append an annotation note like `[Injection-flagged content truncated by VIGIL]`
- **Exempt tools**: Tools in the `exempt_tools` list skip VIGIL checks entirely (useful for tools that legitimately process untrusted content)
- **Subagents**: Sub-agent responses bypass VIGIL checks to avoid cascading denials

### Pattern Detection

VIGIL scans for common prompt injection markers:
- Prompt switching cues: "ignore previous instructions", "pretend you are", "you are now"
- System prompt leaks: "system:", "instructions:", "as an AI assistant"
- Jailbreak attempts: "DAN", "do anything now", "roleplay"
- Role assumption: "act as", "respond as if", "in the role of"

User-supplied extra patterns are validated for ReDoS resistance (DFA and regex size limits enforced at config validation time).

### Egress Network Logging

When the web scrape tool makes outbound HTTP requests, Zeph records each request to an audit trail with:

- Request timestamp and correlation ID
- Target domain and HTTP method
- Response status code and latency
- Whether content was flagged by VIGIL

Access the audit trail via `view:cost` command palette entry or manually in the metrics.

## Indirect Prompt Injection (IPI) Defense

Zeph includes a multi-layer defense against indirect prompt injection — malicious instructions embedded in tool outputs, web pages, or MCP server responses that attempt to hijack the agent's behavior.

### Detection Pipeline

Three classifiers operate in sequence on every piece of external content before it enters the LLM context:

| Classifier | Method | Purpose |
|------------|--------|---------|
| **DeBERTa soft-signal** | Local NER model (feature-gated) | Fast token-level detection of injection patterns |
| **AlignSentinel** (3-class) | Lightweight LLM classifier | Classifies content as `safe`, `suspicious`, or `malicious` |
| **TurnCausalAnalyzer** | Heuristic + LLM | Detects whether a tool output is attempting to influence subsequent agent actions |

When any classifier flags content as malicious, the content is quarantined before reaching the LLM. Suspicious content is passed through with a warning annotation. The DeBERTa classifier requires the `candle` feature; without it, detection falls back to regex patterns and the LLM classifiers.

### Cross-Tool Injection Correlation

Zeph tracks injection signals across consecutive tool calls within a single turn. If multiple tool outputs in the same turn contain injection indicators, the correlation engine escalates the severity — even if individual signals are below the blocking threshold. This defends against split-payload attacks where malicious instructions are distributed across multiple tool responses.

### MCP/A2A Security Hardening

- **Tool collision detection**: when multiple MCP servers expose tools with the same name, Zeph detects the collision and either prefixes with the server ID or blocks the duplicate
- **SMCP lifecycle**: Secure MCP session lifecycle management with token-based authentication for dynamic server connections
- **IBCT tokens**: Identity-Bound Capability Tokens for A2A agent authentication
- **MCP to ACP confused-deputy enforcement**: prevents MCP tool results from being used to bypass ACP permission boundaries

### Credential Environment Scrubbing

Shell commands executed by the agent run in a scrubbed environment. Variables matching credential patterns (API keys, tokens, passwords) are removed from the subprocess environment before execution. This prevents skills or tool calls from exfiltrating secrets via environment variable inspection.

### PII Protection

A configurable NER-based PII detection system can identify and redact personally identifiable information in tool outputs before they enter the LLM context. A circuit breaker protects against runaway cost from paginated reads that trigger repeated PII scans.

## Code Security

Rust-native memory safety guarantees:

- **Workspace-level `unsafe` ban:** `unsafe_code = "deny"` is set in `[workspace.lints.rust]` in the root `Cargo.toml`, propagating the restriction to every crate in the workspace automatically. The single audited exception is an `#[allow(unsafe_code)]`-annotated block behind the `candle` feature flag for memory-mapped safetensors loading.
- **No panic in production:** `unwrap()` and `expect()` linted via clippy
- **Reduced attack surface:** Unused database backends (MySQL) and transitive dependencies (RSA) are excluded from the build
- **Secure dependencies:** All crates audited with `cargo-deny`
- **MSRV policy:** Rust 1.94+ (Edition 2024) for latest security patches

## Reporting Vulnerabilities

Do not open a public issue. Use [GitHub Security Advisories](https://github.com/bug-ops/zeph/security/advisories/new) to submit a private report.

Include: description, steps to reproduce, potential impact, suggested fix. Expect an initial response within 72 hours.