zeph 0.20.1 - Docs.rs

# Configuration Reference

Complete reference for the Zeph configuration file and environment variables. For the interactive setup wizard, see [Configuration Wizard](../getting-started/wizard.md).

## Config File Resolution

Zeph loads `config/default.toml` at startup and applies environment variable overrides.

```bash
# CLI argument (highest priority)
zeph --config /path/to/custom.toml

# Environment variable
ZEPH_CONFIG=/path/to/custom.toml zeph

# Default (fallback)
# config/default.toml
```

Priority: `--config` > `ZEPH_CONFIG` > `config/default.toml`.

## Validation

`Config::validate()` runs at startup and rejects out-of-range values:

| Field | Constraint |
|-------|-----------|
| `memory.history_limit` | <= 10,000 |
| `memory.context_budget_tokens` | <= 1,000,000 (when > 0) |
| `memory.soft_compaction_threshold` | 0.0–1.0, must be < `hard_compaction_threshold` |
| `memory.hard_compaction_threshold` | 0.0–1.0, must be > `soft_compaction_threshold` |
| `memory.graph.temporal_decay_rate` | finite, in [0.0, 10.0]; NaN and Inf rejected at deserialization |
| `memory.compression.threshold_tokens` | >= 1,000 (proactive only) |
| `memory.compression.max_summary_tokens` | >= 128 (proactive only) |
| `memory.compression.probe.threshold` | (0.0, 1.0], must be > `hard_fail_threshold` |
| `memory.compression.probe.hard_fail_threshold` | [0.0, 1.0), must be < `threshold` |
| `memory.compression.probe.max_questions` | >= 1 |
| `memory.compression.probe.timeout_secs` | >= 1 |
| `memory.semantic.importance_weight` | finite, in [0.0, 1.0] |
| `memory.graph.spreading_activation.decay_lambda` | in (0.0, 1.0] |
| `memory.graph.spreading_activation.max_hops` | >= 1 |
| `memory.graph.spreading_activation.activation_threshold` | < `inhibition_threshold` |
| `memory.graph.spreading_activation.inhibition_threshold` | > `activation_threshold` |
| `memory.graph.spreading_activation.seed_structural_weight` | in [0.0, 1.0] |
| `memory.graph.note_linking.link_weight_decay_lambda` | finite, in (0.0, 1.0] |
| `llm.semantic_cache_threshold` | finite, in [0.0, 1.0] |
| `orchestration.plan_cache.similarity_threshold` | in [0.5, 1.0] |
| `orchestration.plan_cache.max_templates` | in [1, 10000] |
| `orchestration.plan_cache.ttl_days` | in [1, 365] |
| `memory.token_safety_margin` | > 0.0 |
| `agent.max_tool_iterations` | <= 100 |
| `a2a.rate_limit` | > 0 |
| `acp.max_sessions` | > 0 |
| `acp.session_idle_timeout_secs` | > 0 |
| `acp.permission_file` | valid file path (optional) |
| `acp.lsp.request_timeout_secs` | > 0 |
| `gateway.rate_limit` | > 0 |
| `gateway.max_body_size` | <= 10,485,760 (10 MiB) |

## Hot-Reload

Zeph watches the config file for changes and applies runtime-safe fields without restart (500ms debounce).

**Reloadable fields:**

| Section | Fields |
|---------|--------|
| `[security]` | `redact_secrets` |
| `[timeouts]` | `llm_seconds`, `embedding_seconds`, `a2a_seconds` |
| `[memory]` | `history_limit`, `summarization_threshold`, `context_budget_tokens`, `soft_compaction_threshold`, `hard_compaction_threshold`, `compaction_preserve_tail`, `prune_protect_tokens`, `cross_session_score_threshold` |
| `[memory.semantic]` | `recall_limit` |
| `[index]` | `repo_map_ttl_secs`, `watch` |
| `[agent]` | `max_tool_iterations` |
| `[skills]` | `max_active_skills` |

**Not reloadable** (require restart): LLM provider/model, SQLite path, Qdrant URL, vector backend, Telegram token, MCP servers, A2A config, ACP config (including `[acp.lsp]`), agents config, skill paths, LSP context injection config (`[agent.lsp]`), compaction probe config (`[memory.compression.probe]`).

> **Breaking change (v0.17.0):** The old `[llm.cloud]`, `[llm.orchestrator]`, and `[llm.router]` config sections have been removed. Run `zeph --migrate-config` to automatically convert your config file.

## Configuration File

```toml
[agent]
name = "Zeph"
max_tool_iterations = 10  # Max tool loop iterations per response (default: 10)
auto_update_check = true  # Query GitHub Releases API for newer versions (default: true)

[agent.instructions]
auto_detect    = true    # Auto-detect provider-specific files: CLAUDE.md, AGENTS.md, GEMINI.md (default: true)
extra_files    = []      # Additional instruction files (absolute or relative to cwd)
max_size_bytes = 262144  # Per-file size cap in bytes (default: 256 KiB)
# zeph.md and .zeph/zeph.md are always loaded regardless of auto_detect.
# Use --instruction-file <path> at the CLI to supply extra files at startup.

# LSP context injection — requires lsp-context feature and mcpls MCP server.
# Enable with --lsp-context CLI flag or by setting enabled = true here.
# [agent.lsp]
# enabled = false                # Enable LSP context injection hooks (default: false)
# mcp_server_id = "mcpls"       # MCP server ID providing LSP tools (default: "mcpls")
# token_budget = 2000            # Max tokens to spend on injected LSP context per turn (default: 2000)
#
# [agent.lsp.diagnostics]
# enabled = true                 # Inject diagnostics after write_file (default: true when agent.lsp is enabled)
# max_per_file = 20              # Max diagnostics per file (default: 20)
# max_files = 5                  # Max files per injection batch (default: 5)
# min_severity = "error"         # Minimum severity: "error", "warning", "info", or "hint" (default: "error")
#
# [agent.lsp.hover]
# enabled = false                # Pre-fetch hover info after read_file (default: false)
# max_symbols = 10               # Max symbols to fetch hover for per file (default: 10)
#
# [agent.lsp.references]
# enabled = true                 # Inject reference list before rename_symbol (default: true)
# max_refs = 50                  # Max references to show per symbol (default: 50)

[agent.learning]
correction_detection = true           # Enable implicit correction detection (default: true)
correction_confidence_threshold = 0.7 # Jaccard token overlap threshold for correction candidates (default: 0.7)
correction_recall_limit = 3           # Max corrections injected into system prompt (default: 3)
correction_min_similarity = 0.75      # Min cosine similarity for correction recall from Qdrant (default: 0.75)

[llm]
# routing = "none"      # none (default), ema, thompson, cascade, task, triage
# router_ema_enabled = false         # EMA-based provider latency routing (default: false)
# router_ema_alpha = 0.1             # EMA smoothing factor, 0.0–1.0 (default: 0.1)
# router_reorder_interval = 10       # Re-order providers every N requests (default: 10)
# thompson_state_path = "~/.zeph/router_thompson_state.json"  # Thompson state persistence path
# response_cache_enabled = false     # SQLite-backed LLM response cache (default: false)
# response_cache_ttl_secs = 3600     # Cache TTL in seconds (default: 3600)
# semantic_cache_enabled = false     # Embedding-based similarity cache (default: false)
# semantic_cache_threshold = 0.95    # Cosine similarity for cache hit (default: 0.95)
# semantic_cache_max_candidates = 10 # Max entries to examine per lookup (default: 10)

# Dedicated provider for tool-pair summarization and context compaction (optional).
# String shorthand — pick one format, or use [llm.summary_provider] below.
# summary_model = "ollama/qwen3:1.7b"              # ollama/<model>
# summary_model = "claude"                         # Claude, model from the claude provider entry
# summary_model = "claude/claude-haiku-4-5-20251001"
# summary_model = "openai/gpt-4o-mini"
# summary_model = "compatible/<name>"              # [[llm.providers]] entry name for compatible type
# summary_model = "candle"

# Structured summary provider. Takes precedence over summary_model when both are set.
# [llm.summary_provider]
# type = "claude"                        # claude, openai, compatible, ollama, candle
# model = "claude-haiku-4-5-20251001"   # model override
# base_url = "..."                       # endpoint override (ollama / openai only)
# embedding_model = "..."               # embedding model override (ollama / openai only)
# device = "cpu"                         # cpu, cuda, metal (candle only)

# Cascade routing options (when routing = "cascade").
# [llm.cascade]
# quality_threshold = 0.5             # Score below which response is degenerate (default: 0.5)
# max_escalations = 2                 # Max escalation steps per request (default: 2)
# classifier_mode = "heuristic"       # "heuristic" (default) or "judge" (LLM-backed)
# max_cascade_tokens = 0              # Cumulative token cap across escalation levels; 0 = unlimited
# cost_tiers = ["ollama", "claude"]   # Explicit cost ordering (cheapest first)

# Quality gate for Thompson/EMA routing — post-selection embedding similarity check.
# quality_gate = 0.0    # Cosine threshold; 0.0 = disabled (default: 0.0). Applies to thompson/ema only.

# ASI coherence tracking — penalizes providers with low response coherence.
# [llm.routing.asi]
# enabled             = false
# window_size         = 10      # Sliding window of response embeddings per provider (default: 10)
# coherence_threshold = 0.5     # Warn when rolling mean drops below this (default: 0.5)
# penalty_weight      = 0.3     # Multiplier applied to Thompson/EMA scores (default: 0.3)
# embedding_provider  = ""      # Provider name for response embeddings; empty = primary

# Complexity triage routing options (when routing = "triage").
# [llm.complexity_routing]
# triage_provider = "fast"            # Provider name used for classification (required)
# bypass_single_provider = true       # Skip triage when all tiers map to the same provider (default: true)
# triage_timeout_secs = 5             # Triage call timeout; falls back to simple tier on expiry (default: 5)
# max_triage_tokens = 50              # Max tokens in triage response (default: 50)
# fallback_strategy = "cascade"       # Optional hybrid mode: triage + quality escalation ("cascade" only)
#
# [llm.complexity_routing.tiers]
# simple  = "fast"                    # Provider name for trivial requests; also used as triage fallback
# medium  = "default"                 # Provider name for moderate requests
# complex = "smart"                   # Provider name for multi-step / code-heavy requests
# expert  = "expert"                  # Provider name for research-grade requests

# Provider list — each [[llm.providers]] entry defines one LLM backend.
[[llm.providers]]
type = "ollama"                        # ollama, claude, openai, gemini, candle, compatible
# name = "local"                       # optional: identifier for multi-provider routing; required for compatible
base_url = "http://localhost:11434"
model = "qwen3:8b"
embedding_model = "qwen3-embedding"    # model for text embeddings
# vision_model = "llava:13b"          # Ollama only: dedicated model for image requests
# embed = true                         # mark as embedding provider for skill matching and semantic memory
# default = true                       # mark as primary chat provider
# embed_concurrency = 4                # Max concurrent embedding requests via semaphore (default: 4, 0 = unlimited)

# Additional provider examples:
# [[llm.providers]]
# name = "cloud"
# type = "claude"
# model = "claude-sonnet-4-6"
# max_tokens = 4096
# server_compaction = false            # Enable Claude server-side context compaction (compact-2026-01-12 beta)
# enable_extended_context = false      # Enable Claude 1M context window (context-1m-2025-08-07 beta, Sonnet/Opus 4.6)
# prompt_cache_ttl = "1h"              # "1h" = extended TTL beta (writes ~2× cost); omit or "ephemeral" for default ~5 min
# default = true

# [[llm.providers]]
# type = "openai"
# base_url = "https://api.openai.com/v1"
# model = "gpt-5.2"
# max_tokens = 4096
# embedding_model = "text-embedding-3-small"
# reasoning_effort = "medium"  # low, medium, high (for reasoning models)

# [[llm.providers]]
# type = "gemini"
# model = "gemini-2.0-flash"
# max_tokens = 8192
# embedding_model = "text-embedding-004"  # enable Gemini embeddings (optional)
# thinking_level = "medium"             # minimal, low, medium, high (Gemini 2.5+ only)
# thinking_budget = 8192               # token budget; -1 = dynamic, 0 = disabled (Gemini 2.5+ only)
# include_thoughts = true              # surface thinking chunks in TUI
# base_url = "https://generativelanguage.googleapis.com/v1beta"

# [[llm.providers]]
# name = "groq"
# type = "compatible"
# base_url = "https://api.groq.com/openai/v1"
# model = "llama-3.3-70b-versatile"
# max_tokens = 4096

[llm.stt]
provider = "whisper"
model = "whisper-1"
# base_url = "http://127.0.0.1:8080/v1"  # optional: OpenAI-compatible server
# language = "en"                          # optional: ISO-639-1 code or "auto"
# Requires `stt` feature. When base_url is set, targets a local server (no API key needed).
# When omitted, uses the OpenAI API key from the openai [[llm.providers]] entry or ZEPH_OPENAI_API_KEY.

[skills]
# Defaults to the user config dir when omitted
# (for example ~/.config/zeph/skills on Linux,
# ~/Library/Application Support/Zeph/skills on macOS,
# %APPDATA%\zeph\skills on Windows).
# paths = ["/absolute/path/to/skills"]
max_active_skills = 5              # Top-K skills per query via embedding similarity
disambiguation_threshold = 0.05    # LLM disambiguation when top-2 score delta < threshold (0.0 = disabled)
prompt_mode = "auto"               # Skill prompt format: "full", "compact", or "auto" (default: "auto")
cosine_weight = 0.7                # Cosine signal weight in BM25+cosine fusion (default: 0.7)
hybrid_search = false              # Enable BM25+cosine hybrid skill matching (default: false)

[skills.learning]
enabled = true                     # Enable self-learning skill improvement (default: true)
auto_activate = false              # Require manual approval for new versions (default: false)
min_failures = 3                   # Failures before triggering improvement (default: 3)
improve_threshold = 0.7            # Success rate below which improvement starts (default: 0.7)
rollback_threshold = 0.5           # Auto-rollback when success rate drops below this (default: 0.5)
min_evaluations = 5                # Minimum evaluations before rollback decision (default: 5)
max_versions = 10                  # Max auto-generated versions per skill (default: 10)
cooldown_minutes = 60              # Cooldown between improvements for same skill (default: 60)
detector_mode = "regex"            # Correction detector: "regex" (default) or "judge" (LLM-backed)
judge_model = ""                   # Model for judge calls; empty = use primary provider
judge_adaptive_low = 0.5           # Regex confidence below this bypasses judge (default: 0.5)
judge_adaptive_high = 0.8          # Regex confidence at/above this bypasses judge (default: 0.8)

[memory]
# Defaults to the user data dir when omitted
# (for example ~/.local/share/zeph/data/zeph.db on Linux,
# ~/Library/Application Support/Zeph/data/zeph.db on macOS,
# %LOCALAPPDATA%\Zeph\data\zeph.db on Windows).
# sqlite_path = "/absolute/path/to/zeph.db"
history_limit = 50
summarization_threshold = 100  # Trigger summarization after N messages
context_budget_tokens = 0      # 0 = unlimited (proportional split: 15% summaries, 25% recall, 60% recent)
soft_compaction_threshold = 0.60  # Soft tier: prune tool outputs + apply deferred summaries (no LLM); default: 0.60
hard_compaction_threshold = 0.90  # Hard tier: full LLM summarization when usage exceeds this fraction; default: 0.90
compaction_preserve_tail = 4   # Keep last N messages during compaction
prune_protect_tokens = 40000   # Protect recent N tokens from tool output pruning
cross_session_score_threshold = 0.35  # Minimum relevance for cross-session results
vector_backend = "qdrant"     # Vector store: "qdrant" (default) or "sqlite" (embedded)
sqlite_pool_size = 5          # SQLite connection pool size (default: 5)
response_cache_cleanup_interval_secs = 3600  # Interval for purging expired LLM response cache entries (default: 3600)
token_safety_margin = 1.0     # Multiplier for token budget safety margin (default: 1.0)
redact_credentials = true     # Scrub credential patterns from LLM context (default: true)
autosave_assistant = false    # Persist assistant responses to SQLite and embed (default: false)
autosave_min_length = 20      # Min content length for assistant embedding (default: 20)
tool_call_cutoff = 6          # Summarize oldest tool pair when visible pairs exceed this (default: 6)
# key_facts_dedup_threshold = 0.95  # Cosine similarity threshold for near-duplicate key_facts suppression (default: 0.95)

# Persona memory — extract and inject stable user preference and domain facts.
# [memory.persona]
# enabled                 = false
# persona_provider        = "fast"   # cheap extraction model; falls back to primary
# min_confidence          = 0.6      # facts below this are discarded (default: 0.6)
# min_messages            = 3        # minimum user messages before first extraction (default: 3)
# max_messages            = 10       # messages fed to LLM per extraction pass (default: 10)
# extraction_timeout_secs = 10       # timeout for extraction LLM call (default: 10)
# context_budget_tokens   = 500      # token budget for injected persona facts (default: 500)

# Trajectory memory — extract procedural/episodic entries from tool-call turns.
# [memory.trajectory]
# enabled                 = false
# trajectory_provider     = "fast"   # cheap extraction model; falls back to primary
# context_budget_tokens   = 400      # token budget for injected trajectory hints (default: 400)
# recall_top_k            = 5        # procedural entries retrieved per turn (default: 5)
# min_confidence          = 0.6      # entries below this are discarded (default: 0.6)
# max_messages            = 10       # messages fed to LLM per extraction pass (default: 10)
# extraction_timeout_secs = 10       # timeout for extraction LLM call (default: 10)

# Category-aware memory — tag messages with a category from active skill/tool context.
# [memory.category]
# enabled  = false
# auto_tag = true    # derive category from active skill or tool type automatically (default: true)

# TiMem temporal-hierarchical memory tree — hierarchical summary consolidation.
# [memory.tree]
# enabled                = false
# consolidation_provider = "fast"  # falls back to primary
# sweep_interval_secs    = 300     # background consolidation interval (default: 300)
# batch_size             = 20      # leaves processed per sweep (default: 20)
# similarity_threshold   = 0.8     # cosine threshold for clustering (default: 0.8)
# max_level              = 3       # maximum tree depth above leaves (default: 3)
# context_budget_tokens  = 400     # token budget for tree traversal in context (default: 400)
# recall_top_k           = 5       # nodes retrieved per turn (default: 5)
# min_cluster_size       = 2       # minimum cluster size to trigger LLM consolidation (default: 2)

# Time-based microcompact — clear stale low-value tool outputs after an idle gap.
# [memory.microcompact]
# enabled               = false
# gap_threshold_minutes = 60   # idle gap in minutes before clearing stale outputs (default: 60)
# keep_recent           = 3    # most recent low-value tool outputs to preserve (default: 3)

# autoDream — background memory consolidation after session-count and time gates pass.
# [memory.autodream]
# enabled                = false
# min_sessions           = 3     # sessions since last consolidation (default: 3)
# min_hours              = 24    # hours since last consolidation (default: 24)
# consolidation_provider = ""    # provider name; falls back to primary
# max_iterations         = 8     # safety bound for consolidation sweep (default: 8)

[memory.semantic]
enabled = false               # Enable semantic search via Qdrant
recall_limit = 5              # Number of semantically relevant messages to inject
temporal_decay_enabled = false        # Attenuate scores by message age (default: false)
temporal_decay_half_life_days = 30    # Half-life for temporal decay in days (default: 30)
mmr_enabled = false                   # MMR re-ranking for result diversity (default: false)
mmr_lambda = 0.7                      # MMR relevance-diversity trade-off, 0.0-1.0 (default: 0.7)
importance_enabled = false            # Write-time importance scoring for recall boost (default: false)
importance_weight = 0.15              # Blend weight for importance in ranking, [0.0, 1.0] (default: 0.15)

[memory.routing]
strategy = "heuristic"        # Routing strategy for memory backend selection (default: "heuristic")

# [memory.admission]
# enabled = false                    # Enable A-MAC adaptive memory admission control (default: false)
# threshold = 0.40                   # Composite score threshold; messages below this are rejected (default: 0.40)
# fast_path_margin = 0.15            # Admit immediately when score >= threshold + margin (default: 0.15)
# admission_provider = "fast"        # Provider for LLM-assisted admission decisions (optional, default: "")
# admission_strategy = "heuristic"   # "heuristic" (default) or "rl" (preview — falls back to heuristic)
# rl_min_samples = 500               # Training samples required before RL model activates (default: 500)
# rl_retrain_interval_secs = 3600    # Background RL retraining interval in seconds (default: 3600)
#
# [memory.admission.weights]
# future_utility = 0.30              # LLM-estimated future reuse probability (heuristic mode only)
# factual_confidence = 0.15          # Inverse of hedging markers
# semantic_novelty = 0.30            # 1 - max similarity to existing memories
# temporal_recency = 0.10            # Always 1.0 at write time
# content_type_prior = 0.15          # Role-based prior

[memory.compression]
strategy = "reactive"         # "reactive" (default) or "proactive"
# Proactive strategy fields (required when strategy = "proactive"):
# threshold_tokens = 80000   # Fire compression when context exceeds this token count (>= 1000)
# max_summary_tokens = 4000  # Cap for the compressed summary (>= 128)
# model = ""                 # Reserved — currently unused
# archive_tool_outputs = false  # Archive tool output bodies to SQLite before compaction (default: false)

[memory.compression.probe]
# enabled = false           # Enable compaction probe validation (default: false)
# model = ""                # Model for probe LLM calls; empty = summary provider (default: "")
# threshold = 0.6           # Minimum score for Pass verdict (default: 0.6)
# hard_fail_threshold = 0.35 # Score below this blocks compaction (default: 0.35)
# max_questions = 3         # Factual questions per probe (default: 3)
# timeout_secs = 15         # Timeout for both LLM calls in seconds (default: 15)

[memory.compression_guidelines]
enabled = false                # Enable failure-driven compression guidelines (default: false)
# update_threshold = 5        # Minimum unused failure pairs before triggering a guidelines update (default: 5)
# max_guidelines_tokens = 500 # Token budget for the guidelines document (default: 500)
# max_pairs_per_update = 10   # Failure pairs consumed per update cycle (default: 10)
# detection_window_turns = 10 # Turns after hard compaction to watch for context loss (default: 10)
# update_interval_secs = 300  # Interval in seconds between background updater checks (default: 300)
# max_stored_pairs = 100      # Maximum unused failure pairs retained before cleanup (default: 100)
# categorized_guidelines = false  # Maintain separate guideline documents per content category (default: false)

[memory.graph]
enabled = false                        # Enable graph memory (default: false, requires graph-memory feature)
extract_model = ""                     # LLM model for entity extraction; empty = agent's model
max_entities_per_message = 10          # Max entities extracted per message (default: 10)
max_edges_per_message = 15             # Max edges extracted per message (default: 15)
community_refresh_interval = 100       # Messages between community recalculation (default: 100)
entity_similarity_threshold = 0.85     # Cosine threshold for entity dedup (default: 0.85)
extraction_timeout_secs = 15           # Timeout for background extraction (default: 15)
use_embedding_resolution = false       # Use embedding-based entity resolution (default: false)
max_hops = 2                           # BFS traversal depth for graph recall (default: 2)
recall_limit = 10                      # Max graph facts injected into context (default: 10)
temporal_decay_rate = 0.0              # Recency boost for graph recall; 0.0 = disabled (default: 0.0)
                                       # Range: [0.0, 10.0]. Formula: 1/(1 + age_days * rate)
edge_history_limit = 100               # Max historical edge versions per source+predicate pair (default: 100)

[memory.graph.spreading_activation]
# enabled = false                     # Replace BFS with spreading activation (default: false)
# decay_lambda = 0.85                 # Per-hop decay factor, (0.0, 1.0] (default: 0.85)
# max_hops = 3                        # Maximum propagation depth (default: 3)
# activation_threshold = 0.1          # Minimum activation for inclusion (default: 0.1)
# inhibition_threshold = 0.8          # Lateral inhibition threshold (default: 0.8)
# max_activated_nodes = 50            # Cap on activated nodes (default: 50)

[memory.quality_gate]
# enabled = false                              # Enable write quality gate (default: false)
# information_value_threshold = 0.3            # Cosine similarity vs recent context (default: 0.3)
# reference_completeness_threshold = 0.5       # Pronoun/deictic completeness heuristic (default: 0.5)
# contradiction_risk_threshold = 0.7           # Graph edge conflict risk (default: 0.7)
# Fail-open contract: embed/LLM/graph errors yield neutral defaults. Requires graph memory for contradiction scoring.

[session.recap]
on_resume = true              # Auto-generate recap when resuming a stored conversation (default: true)
# recap_provider = ""           # Provider name for recap generation; empty = primary provider (default: "")
max_tokens = 500              # Max tokens for the recap summary (default: 500)
max_input_messages = 50       # Max messages included in recap context (default: 50)

[tools]
enabled = true
summarize_output = false      # LLM-based summarization for long tool outputs
# max_tool_calls_per_session = 50  # Hard cap on tool executions per session; resets on /clear (default: unset = unlimited)

[tools.shell]
timeout = 30
blocked_commands = []
allowed_commands = []
allowed_paths = []          # Directories shell can access (empty = cwd only)
allow_network = true        # false blocks curl/wget/nc
confirm_patterns = ["rm ", "git push -f", "git push --force", "drop table", "drop database", "truncate ", "$(", "`", "<(", ">(", "<<<", "eval "]

[tools.file]
allowed_paths = []          # Directories file tools can access (empty = cwd only)

[tools.scrape]
timeout = 15
max_body_bytes = 1048576  # 1MB

[tools.sandbox]
# OS-level subprocess isolation for shell commands (macOS: Seatbelt, Linux: bubblewrap + Landlock/seccomp)
# Defaults: deny (no access), opt-in allow for specific paths
# disabled = false             # Disable sandboxing and run shell commands unsandboxed (default: false)
# allow_read = []              # Paths/globs readable by sandboxed commands
# allow_write = []             # Paths/globs writable by sandboxed commands
# allow_network = true         # Allow outbound network access (default: true)

[tools.filters]
enabled = true              # Enable smart output filtering for tool results

# [tools.filters.test]
# enabled = true
# max_failures = 10         # Truncate after N test failures
# truncate_stack_trace = 50 # Max stack trace lines per failure

# [tools.filters.git]
# enabled = true
# max_log_entries = 20      # Max git log entries
# max_diff_lines = 500      # Max diff lines

# [tools.filters.clippy]
# enabled = true

# [tools.filters.cargo_build]
# enabled = true

# [tools.filters.dir_listing]
# enabled = true

# [tools.filters.log_dedup]
# enabled = true

# [tools.filters.security]
# enabled = true
# extra_patterns = []       # Additional regex patterns to redact

# Per-tool permission rules (glob patterns with allow/ask/deny actions).
# Overrides legacy blocked_commands/confirm_patterns when set.
# [tools.permissions]
# shell = [
#   { pattern = "/tmp/*", action = "allow" },
#   { pattern = "/etc/*", action = "deny" },
#   { pattern = "*sudo*", action = "deny" },
#   { pattern = "cargo *", action = "allow" },
#   { pattern = "*", action = "ask" },
# ]

# Declarative policy compiler for tool call authorization (requires policy-enforcer feature).
# See docs/src/advanced/policy-enforcer.md for the full guide.
# [tools.policy]
# enabled = false           # Enable policy enforcement (default: false)
# default_effect = "deny"   # Fallback when no rule matches: "allow" or "deny" (default: "deny")
# policy_file = "policy.toml"  # Optional external rules file; overrides inline rules when set
#
# Inline rules (can also be loaded from policy_file):
# [[tools.policy.rules]]
# effect = "deny"           # "allow" or "deny"
# tool = "shell"            # Glob pattern for tool name (case-insensitive)
# paths = ["/etc/*", "/root/*"]  # Path globs matched against file_path param (CRIT-01: normalized)
# trust_level = "verified"  # Optional: rule only applies when context trust <= this level
# args_match = ".*sudo.*"   # Optional: regex matched against individual string param values
#
# [[tools.policy.rules]]
# effect = "allow"
# tool = "shell"
# paths = ["/tmp/*"]

# Supplementary OAP authorization layer (requires policy-enforcer feature).
# Rules are merged into PolicyEnforcer after [tools.policy.rules] (policy takes precedence).
# [tools.authorization]
# enabled = false            # Enable OAP authorization (default: false)
#
# [[tools.authorization.rules]]
# effect    = "deny"         # "allow" or "deny"
# tool      = "bash"         # Glob pattern for tool name
# args_match = ".*sudo.*"   # Optional: regex matched against string param values
#
# [[tools.authorization.rules]]
# effect = "allow"
# tool   = "read"
# paths  = ["/home/*"]

[tools.result_cache]
# enabled = true             # Enable tool result caching (default: true)
# ttl_secs = 300             # Cache entry lifetime in seconds, 0 = no expiry (default: 300)

[tools.tafc]
# enabled = false            # Enable TAFC schema augmentation (default: false)
# complexity_threshold = 0.6 # Complexity threshold for augmentation (default: 0.6)

[tools.dependencies]
# enabled = false            # Enable dependency gating (default: false)
# boost_per_dep = 0.15       # Boost per satisfied soft dependency (default: 0.15)
# max_total_boost = 0.2      # Maximum total soft boost (default: 0.2)
# [tools.dependencies.rules.deploy]
# requires = ["build", "test"]
# prefers = ["lint"]

[tools.overflow]
threshold = 50000           # Offload output larger than N chars to SQLite overflow table (default: 50000)
retention_days = 7          # Days to retain overflow entries before age-based cleanup (default: 7)

[tools.audit]
enabled = false             # Structured JSON audit log for tool executions
destination = "stdout"      # "stdout" or file path

# MagicDocs — auto-maintained markdown files with a "# MAGIC DOC:" header.
# [magic_docs]
# enabled                   = false
# min_turns_between_updates = 5    # turns between updates for the same file (default: 5)
# update_provider           = ""   # provider name; falls back to primary
# max_iterations            = 4    # max iterations per update LLM call (default: 4)

[security]
redact_secrets = true       # Redact API keys/tokens in LLM responses

[security.content_isolation]
enabled = true              # Master switch for untrusted content sanitizer
max_content_size = 65536    # Max bytes per source before truncation (default: 64 KiB)
flag_injection_patterns = true  # Detect and flag injection patterns
spotlight_untrusted = true  # Wrap untrusted content in XML delimiters

[security.content_isolation.quarantine]
enabled = false             # Opt-in: route high-risk sources through quarantine LLM
sources = ["web_scrape", "a2a_message"]  # Source kinds to quarantine
model = "claude"            # Provider/model for quarantine extraction

[security.exfiltration_guard]
block_markdown_images = true  # Strip external markdown images from LLM output
validate_tool_urls = true     # Flag tool calls using URLs from injection-flagged content
guard_memory_writes = true    # Skip Qdrant embedding for injection-flagged content

[timeouts]
llm_seconds = 120           # LLM chat completion timeout
embedding_seconds = 30      # Embedding generation timeout
a2a_seconds = 30            # A2A remote call timeout

[vault]
backend = "env"  # "env" (default) or "age"; CLI --vault overrides this

[observability]
exporter = "none"           # "none" or "otlp" (requires `otel` feature)
endpoint = "http://localhost:4317"

[cost]
enabled = false
max_daily_cents = 500       # Daily budget in cents (USD), UTC midnight reset

[a2a]
enabled = false
host = "0.0.0.0"
port = 8080
# public_url = "https://agent.example.com"
# auth_token = "secret"     # Bearer token for A2A server auth (from vault ZEPH_A2A_AUTH_TOKEN); warn logged at startup if unset
rate_limit = 60

[acp]
enabled = false                    # Auto-start ACP server on plain `zeph` startup using the configured transport (default: false)
max_sessions = 4                   # Max concurrent ACP sessions; LRU eviction when exceeded (default: 4)
session_idle_timeout_secs = 1800   # Idle session reaper timeout in seconds (default: 1800)
broadcast_capacity = 256           # Skill/config reload broadcast backlog shared by ACP sessions (default: 256)
# permission_file = "~/.config/zeph/acp-permissions.toml"  # Path to persisted permission decisions (default: ~/.config/zeph/acp-permissions.toml)
# auth_bearer_token = ""           # Bearer token for ACP HTTP/WS auth (env: ZEPH_ACP_AUTH_TOKEN, CLI: --acp-auth-token); omit for open mode (local use only)
discovery_enabled = true           # Expose GET /.well-known/acp.json manifest endpoint (env: ZEPH_ACP_DISCOVERY_ENABLED, default: true)

[acp.lsp]
enabled = true                     # Enable LSP extension when IDE advertises meta["lsp"] (default: true)
auto_diagnostics_on_save = true    # Fetch diagnostics on lsp/didSave notification (default: true)
max_diagnostics_per_file = 20      # Max diagnostics accepted per file (default: 20)
max_diagnostic_files = 5           # Max files in DiagnosticsCache, LRU eviction (default: 5)
max_references = 100               # Max reference locations returned (default: 100)
max_workspace_symbols = 50         # Max workspace symbol search results (default: 50)
request_timeout_secs = 10          # Timeout for LSP ext_method calls in seconds (default: 10)

[mcp]
allowed_commands = ["npx", "uvx", "node", "python", "python3"]
max_dynamic_servers = 10

# [[mcp.servers]]
# id = "filesystem"
# command = "npx"
# args = ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
# env = {}                  # Environment variables passed to the child process
# timeout = 30
# trust_level = "untrusted" # trusted, untrusted (default), or sandboxed
# tool_allowlist = []       # Tools to expose from this server; empty = all (untrusted) or none (sandboxed)

[agents]
enabled = false            # Enable sub-agent system (default: false)
max_concurrent = 1         # Max concurrent sub-agents (default: 1)
extra_dirs = []            # Additional directories to scan for agent definitions
# default_memory_scope = "project"  # Default memory scope for agents without explicit `memory` field
                                    # Valid: "user", "project", "local". Omit to disable.
# Lifecycle hooks — see Sub-Agent Orchestration > Hooks for details
# [agents.hooks]
# [[agents.hooks.start]]
# type = "command"
# command = "echo started"
# [[agents.hooks.stop]]
# type = "command"
# command = "./scripts/cleanup.sh"

[orchestration]
enabled = false                          # Enable task orchestration (default: false, requires `orchestration` feature)
max_tasks = 20                           # Max tasks per graph (default: 20)
max_parallel = 4                         # Max concurrent task executions (default: 4)
default_failure_strategy = "abort"       # abort, retry, skip, or ask (default: "abort")
default_max_retries = 3                  # Retries for the "retry" strategy (default: 3)
task_timeout_secs = 300                  # Per-task timeout in seconds, 0 = no timeout (default: 300)
# planner_provider = "quality"            # Provider name from [[llm.providers]] for planning LLM calls; empty = primary provider
planner_max_tokens = 4096                # Max tokens for planner LLM response (default: 4096; reserved — not yet enforced)
dependency_context_budget = 16384       # Character budget for cross-task context injection (default: 16384)
confirm_before_execute = true           # Show task summary and require /plan confirm before executing (default: true)
aggregator_max_tokens = 4096            # Token budget for the aggregation LLM call (default: 4096)
# topology_selection = false            # Enable topology classification and adaptive dispatch (default: false, requires experiments feature)
# verify_provider = ""                  # Provider name from [[llm.providers]] for post-task completeness verification; empty = primary provider

[orchestration.plan_cache]
# enabled = false                       # Enable plan template caching (default: false)
# similarity_threshold = 0.90           # Min cosine similarity for cache hit (default: 0.90)
# ttl_days = 30                         # Days since last access before eviction (default: 30)
# max_templates = 100                    # Maximum cached templates (default: 100)

[gateway]
enabled = false
bind = "127.0.0.1"
port = 8090
# auth_token = "secret"     # Bearer token for gateway auth (from vault ZEPH_GATEWAY_TOKEN); warn logged at startup if unset
rate_limit = 120
max_body_size = 1048576     # 1 MiB

[logging]
file = "/absolute/path/to/zeph.log"  # Optional override; omit to use the platform default in the user data dir (%LOCALAPPDATA%\Zeph\logs\zeph.log on Windows)
level = "info"                # File log level (default: "info"); does not affect stderr/RUST_LOG
rotation = "daily"            # Rotation strategy: daily, hourly, or never (default: "daily")
max_files = 7                 # Rotated log files to retain (default: 7)

[debug]
enabled = false             # Enable debug dump at startup (default: false)
output_dir = "/absolute/path/to/debug"  # Optional override; omit to use the platform default in the user data dir (%LOCALAPPDATA%\Zeph\debug on Windows)

# Requires `classifiers` feature.
# ML-backed injection detection and PII detection via Candle/DeBERTa models.
# When `enabled = false` (the default), the existing regex-based detection runs unchanged.
# [classifiers]
# enabled = false
# timeout_ms = 5000                                             # Per-inference timeout in ms (default: 5000)
# injection_model = "protectai/deberta-v3-small-prompt-injection-v2"  # HuggingFace repo ID
# injection_threshold = 0.8                                    # Minimum score to treat result as injection (default: 0.8)
# injection_model_sha256 = ""                                  # Optional SHA-256 hex for tamper detection
# pii_enabled = false                                          # Enable NER-based PII detection (default: false)
# pii_model = "iiiorg/piiranha-v1-detect-personal-information" # HuggingFace repo ID
# pii_threshold = 0.75                                         # Minimum per-token confidence for a PII label (default: 0.75)
# pii_model_sha256 = ""                                        # Optional SHA-256 hex for tamper detection

# Requires `experiments` feature.
# [experiments]
# enabled = false
# eval_model = "claude-sonnet-4-20250514"  # Model for LLM-as-judge (default: agent's model)
# benchmark_file = "benchmarks/eval.toml"  # Prompt set for A/B comparison
# max_experiments = 20                     # Max variations per session (default: 20)
# max_wall_time_secs = 3600               # Wall-clock budget per session (default: 3600)
# min_improvement = 0.5                   # Min score delta to accept (default: 0.5)
# eval_budget_tokens = 100000             # Token budget for judge calls (default: 100000)
# auto_apply = false                      # Write accepted variations to live config (default: false)
#
# [experiments.schedule]
# enabled = false                          # Cron-based automatic runs (default: false)
# cron = "0 3 * * *"                       # 5-field cron expression (default: daily 03:00)
# max_experiments_per_run = 20             # Cap per scheduled run (default: 20)
# max_wall_time_secs = 1800               # Wall-time cap per run (default: 1800)

[quality]
self_check = false                         # Enable MARCH Proposer+Checker self-check pipeline (default: false)
# trigger = "always"                       # "always", "smart", or "manual" (default: "always")
# latency_budget_ms = 5000                 # Per-turn budget for self-check in milliseconds (default: 5000)
# per_call_timeout_ms = 3000              # Timeout per LLM call (proposer/checker) in milliseconds (default: 3000)
# max_assertions = 10                      # Max atomic assertions extracted by Proposer (default: 10)
# min_evidence = 0.6                       # Min evidence confidence from Checker [0.0-1.0] (default: 0.6)
# flag_marker = "--- MARCH CHECK"          # Marker appended to response when check completes (default: "--- MARCH CHECK")

[cli]
# bare = false                             # Strip to essentials for scripted usage (default: false)
# json = false                             # Emit JSONL events to stdout (default: false)
# auto = false                             # Skip tool confirmation prompts (default: false)

[cli.loop]
min_interval_secs = 5                      # Minimum loop interval in seconds (default: 5)
max_iterations = 1000                      # Max repetitions before loop auto-stops (default: 1000)
```

### Provider Entry Fields

Each `[[llm.providers]]` entry supports:

| Field | Type | Description |
|-------|------|-------------|
| `type` | string | Provider backend (`ollama`, `claude`, `openai`, `gemini`, `candle`, `compatible`) |
| `name` | string? | Identifier for routing; required for `compatible` type |
| `model` | string? | Chat model |
| `base_url` | string? | API endpoint (Ollama / Compatible) |
| `embedding_model` | string? | Embedding model |
| `embed` | bool | Mark as the embedding provider for skill matching and semantic memory |
| `default` | bool | Mark as the primary chat provider |
| `filename` | string? | GGUF filename (Candle only) |
| `device` | string? | Compute device: `cpu`, `metal`, `cuda` (Candle only) |

See [Model Orchestrator](../advanced/orchestrator.md) for multi-provider routing examples and [Complexity Triage Routing](../advanced/complexity-triage.md) for pre-inference classification routing.

## Environment Variables

| Variable | Description |
|----------|-------------|
| `ZEPH_LLM_PROVIDER` | `ollama`, `claude`, `openai`, `candle`, `compatible`, `orchestrator`, or `router` |
| `ZEPH_LLM_BASE_URL` | Ollama API endpoint |
| `ZEPH_LLM_MODEL` | Model name for Ollama |
| `ZEPH_LLM_EMBEDDING_MODEL` | Embedding model for Ollama (default: `qwen3-embedding`) |
| `ZEPH_LLM_VISION_MODEL` | Vision model for Ollama image requests (optional) |
| `ZEPH_CLAUDE_API_KEY` | Anthropic API key (required for Claude) |
| `ZEPH_OPENAI_API_KEY` | OpenAI API key (required for OpenAI provider) |
| `ZEPH_GEMINI_API_KEY` | Google Gemini API key (required for Gemini provider) |
| `ZEPH_TELEGRAM_TOKEN` | Telegram bot token (enables Telegram mode) |
| `ZEPH_SQLITE_PATH` | SQLite database path |
| `ZEPH_QDRANT_URL` | Qdrant server URL (default: `http://localhost:6334`) |
| `ZEPH_MEMORY_SUMMARIZATION_THRESHOLD` | Trigger summarization after N messages (default: 100) |
| `ZEPH_MEMORY_CONTEXT_BUDGET_TOKENS` | Context budget for proportional token allocation (default: 0 = unlimited) |
| `ZEPH_MEMORY_SOFT_COMPACTION_THRESHOLD` | Soft compaction tier: prune tool outputs + apply deferred summaries (no LLM) when context usage exceeds this fraction (default: 0.60, must be < hard threshold) |
| `ZEPH_MEMORY_HARD_COMPACTION_THRESHOLD` | Hard compaction tier: full LLM summarization when context usage exceeds this fraction (default: 0.90). Also accepted as `ZEPH_MEMORY_COMPACTION_THRESHOLD` for backward compatibility. |
| `ZEPH_MEMORY_COMPACTION_PRESERVE_TAIL` | Messages preserved during compaction (default: 4) |
| `ZEPH_MEMORY_PRUNE_PROTECT_TOKENS` | Tokens protected from Tier 1 tool output pruning (default: 40000) |
| `ZEPH_MEMORY_CROSS_SESSION_SCORE_THRESHOLD` | Minimum relevance score for cross-session memory (default: 0.35) |
| `ZEPH_MEMORY_VECTOR_BACKEND` | Vector backend: `qdrant` or `sqlite` (default: `qdrant`) |
| `ZEPH_MEMORY_TOKEN_SAFETY_MARGIN` | Token budget safety margin multiplier (default: 1.0) |
| `ZEPH_MEMORY_REDACT_CREDENTIALS` | Scrub credentials from LLM context (default: true) |
| `ZEPH_MEMORY_AUTOSAVE_ASSISTANT` | Persist assistant responses to SQLite (default: false) |
| `ZEPH_MEMORY_AUTOSAVE_MIN_LENGTH` | Min content length for assistant embedding (default: 20) |
| `ZEPH_MEMORY_TOOL_CALL_CUTOFF` | Max visible tool pairs before oldest is summarized (default: 6) |
| `ZEPH_LLM_RESPONSE_CACHE_ENABLED` | Enable SQLite-backed LLM response cache (default: false) |
| `ZEPH_LLM_RESPONSE_CACHE_TTL_SECS` | Response cache TTL in seconds (default: 3600) |
| `ZEPH_LLM_SEMANTIC_CACHE_ENABLED` | Enable semantic similarity-based response caching (default: false) |
| `ZEPH_LLM_SEMANTIC_CACHE_THRESHOLD` | Cosine similarity threshold for semantic cache hit (default: 0.95) |
| `ZEPH_LLM_SEMANTIC_CACHE_MAX_CANDIDATES` | Max entries examined per semantic cache lookup (default: 10) |
| `ZEPH_MEMORY_SQLITE_POOL_SIZE` | SQLite connection pool size (default: 5) |
| `ZEPH_MEMORY_RESPONSE_CACHE_CLEANUP_INTERVAL_SECS` | Interval for purging expired LLM response cache entries in seconds (default: 3600) |
| `ZEPH_MEMORY_SEMANTIC_ENABLED` | Enable semantic memory (default: false) |
| `ZEPH_MEMORY_RECALL_LIMIT` | Max semantically relevant messages to recall (default: 5) |
| `ZEPH_MEMORY_SEMANTIC_TEMPORAL_DECAY_ENABLED` | Enable temporal decay scoring (default: false) |
| `ZEPH_MEMORY_SEMANTIC_TEMPORAL_DECAY_HALF_LIFE_DAYS` | Half-life for temporal decay in days (default: 30) |
| `ZEPH_MEMORY_SEMANTIC_MMR_ENABLED` | Enable MMR re-ranking (default: false) |
| `ZEPH_MEMORY_SEMANTIC_MMR_LAMBDA` | MMR relevance-diversity trade-off (default: 0.7) |
| `ZEPH_SKILLS_MAX_ACTIVE` | Max skills per query via embedding match (default: 5) |
| `ZEPH_AGENT_MAX_TOOL_ITERATIONS` | Max tool loop iterations per response (default: 10) |
| `ZEPH_TOOLS_SUMMARIZE_OUTPUT` | Enable LLM-based tool output summarization (default: false) |
| `ZEPH_TOOLS_TIMEOUT` | Shell command timeout in seconds (default: 30) |
| `ZEPH_TOOLS_SCRAPE_TIMEOUT` | Web scrape request timeout in seconds (default: 15) |
| `ZEPH_TOOLS_SCRAPE_MAX_BODY` | Max response body size in bytes (default: 1048576) |
| `ZEPH_ACP_MAX_SESSIONS` | Max concurrent ACP sessions (default: 4) |
| `ZEPH_ACP_SESSION_IDLE_TIMEOUT_SECS` | Idle session reaper timeout in seconds (default: 1800) |
| `ZEPH_ACP_PERMISSION_FILE` | Path to persisted ACP permission decisions (default: `~/.config/zeph/acp-permissions.toml`) |
| `ZEPH_ACP_AUTH_TOKEN` | Bearer token for ACP HTTP/WS authentication; omit for open mode (local use only) |
| `ZEPH_ACP_DISCOVERY_ENABLED` | Expose `GET /.well-known/acp.json` manifest endpoint (default: `true`) |
| `ZEPH_A2A_ENABLED` | Enable A2A server (default: false) |
| `ZEPH_A2A_HOST` | A2A server bind address (default: `0.0.0.0`) |
| `ZEPH_A2A_PORT` | A2A server port (default: `8080`) |
| `ZEPH_A2A_PUBLIC_URL` | Public URL for agent card discovery |
| `ZEPH_A2A_AUTH_TOKEN` | Bearer token for A2A server authentication |
| `ZEPH_A2A_RATE_LIMIT` | Max requests per IP per minute (default: 60) |
| `ZEPH_A2A_REQUIRE_TLS` | Require HTTPS for outbound A2A connections (default: true) |
| `ZEPH_A2A_SSRF_PROTECTION` | Block private/loopback IPs in A2A client (default: true) |
| `ZEPH_A2A_MAX_BODY_SIZE` | Max request body size in bytes (default: 1048576) |
| `ZEPH_AGENTS_ENABLED` | Enable sub-agent system (default: false) |
| `ZEPH_AGENTS_MAX_CONCURRENT` | Max concurrent sub-agents (default: 1) |
| `ZEPH_GATEWAY_ENABLED` | Enable HTTP gateway (default: false) |
| `ZEPH_GATEWAY_BIND` | Gateway bind address (default: `127.0.0.1`) |
| `ZEPH_GATEWAY_PORT` | Gateway HTTP port (default: `8090`) |
| `ZEPH_GATEWAY_TOKEN` | Bearer token for gateway authentication; warn logged at startup if unset |
| `ZEPH_GATEWAY_RATE_LIMIT` | Max requests per IP per minute (default: 120) |
| `ZEPH_GATEWAY_MAX_BODY_SIZE` | Max request body size in bytes (default: 1048576) |
| `ZEPH_TOOLS_FILE_ALLOWED_PATHS` | Comma-separated directories file tools can access (empty = cwd) |
| `ZEPH_TOOLS_SHELL_ALLOWED_PATHS` | Comma-separated directories shell can access (empty = cwd) |
| `ZEPH_TOOLS_SHELL_ALLOW_NETWORK` | Allow network commands from shell (default: true) |
| `ZEPH_TOOLS_AUDIT_ENABLED` | Enable audit logging for tool executions (default: false) |
| `ZEPH_TOOLS_AUDIT_DESTINATION` | Audit log destination: `stdout` or file path |
| `ZEPH_SECURITY_REDACT_SECRETS` | Redact secrets in LLM responses (default: true) |
| `ZEPH_TIMEOUT_LLM` | LLM call timeout in seconds (default: 120) |
| `ZEPH_TIMEOUT_EMBEDDING` | Embedding generation timeout in seconds (default: 30) |
| `ZEPH_TIMEOUT_A2A` | A2A remote call timeout in seconds (default: 30) |
| `ZEPH_OBSERVABILITY_EXPORTER` | Tracing exporter: `none` or `otlp` (default: `none`, requires `otel` feature) |
| `ZEPH_OBSERVABILITY_ENDPOINT` | OTLP gRPC endpoint (default: `http://localhost:4317`) |
| `ZEPH_COST_ENABLED` | Enable cost tracking (default: false) |
| `ZEPH_COST_MAX_DAILY_CENTS` | Daily spending limit in cents (default: 500) |
| `ZEPH_STT_PROVIDER` | STT provider: `whisper` or `candle-whisper` (default: `whisper`, requires `stt` feature) |
| `ZEPH_STT_MODEL` | STT model name (default: `whisper-1`) |
| `ZEPH_STT_BASE_URL` | STT server base URL (e.g. `http://127.0.0.1:8080/v1` for local whisper.cpp) |
| `ZEPH_STT_LANGUAGE` | STT language: ISO-639-1 code or `auto` (default: `auto`) |
| `ZEPH_LOG_FILE` | Override `logging.file` (log file path; empty string disables file logging) |
| `ZEPH_LOG_LEVEL` | Override `logging.level` (file log level, e.g. `debug`, `warn`) |
| `ZEPH_CONFIG` | Path to config file (default: `config/default.toml`) |
| `ZEPH_TUI` | Enable TUI dashboard: `true` or `1` (requires `tui` feature) |
| `ZEPH_AUTO_UPDATE_CHECK` | Enable automatic update checks: `true` or `false` (default: `true`) |