aidaemon 0.9.24

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Build & Run

```bash
cargo build                          # debug build
cargo build --release                # release build
cargo build --features discord       # with Discord channel
cargo build --features slack         # with Slack channel
cargo build --features browser       # with headless Chrome tool
cargo build --features encryption    # with SQLCipher encryption
cargo build --features "discord,slack,browser"  # multiple features
```

```bash
cargo test                           # run all tests
cargo test router                    # run router tests only
cargo test --lib memory              # run memory tests only
cargo test <test_name>               # run a single test by name
```

```bash
cargo clippy                         # lint
cargo fmt --check                    # check formatting
cargo fmt                            # auto-format
```

No `rustfmt.toml` — uses default Rust formatting conventions.

## Releasing & Publishing

**Pre-commit checklist (MUST pass before committing):**
1. `cargo fmt` — auto-format all code
2. `cargo clippy --all-features -- -D warnings` — zero warnings
3. `cargo test` — all tests pass

**Release steps:**
1. Bump version in `Cargo.toml`
2. Add a changelog entry to `CHANGELOG.md` following [Keep a Changelog](https://keepachangelog.com/) format (Added/Changed/Fixed/Security sections)
3. Run the pre-commit checklist above
4. Stage all changes including `Cargo.lock` and commit
5. Push to `master`
6. Tag with `git tag vX.Y.Z` and push the tag — **only after the commit passes fmt + clippy + tests**
7. Create a GitHub release via `gh release create`
8. CI handles `cargo publish` and Homebrew tap update automatically

**IMPORTANT**: `cargo publish` packages only git-tracked files. Before publishing:
- Ensure all changes are committed and pushed — do NOT use `--allow-dirty`
- Ensure `.gitignore` excludes non-Rust artifacts (`node_modules/`, temp files, etc.)
- The crate has a 10MB upload limit — if it fails with "Payload Too Large", check what's being packaged with `cargo package --list`

## Architecture

**aidaemon** is a personal AI agent daemon (single Rust binary) accessible via Telegram/Slack/Discord with agentic tool use, MCP integration, and persistent memory.

### Core Flow

```
main.rs → config loading → core.rs (subsystem init) → spawn channels + agent + background tasks
```

The **agent loop** (`agent/`) is the heart: user message → build history → router selects model → LLM call → if tool calls, execute and loop → return response. It has stall detection (same tool 3+ times), repetition detection, and hard iteration limits. The agent was decomposed in v0.9.0 into submodules: `agent/loop/` (phases), `agent/runtime/` (LLM calls, history, system prompt), `agent/consultant/` (multi-pass analysis), `agent/intent/` (intent classification), `agent/policy/` (guardrails).

### Key Abstractions (traits.rs)

Four core traits drive the architecture:
- **`Tool`** — anything the LLM can call (`name()`, `schema()`, `call()`)
- **`Channel`** — input sources (`send_text()`, `send_media()`, `request_approval()`)
- **`StateStore`** — persistence layer (SQLite impl in `state/sqlite.rs`)
- **`ModelProvider`** — LLM backends (`chat()`, `list_models()`)

#### Tool Schema Format (IMPORTANT)

`schema()` must return the **full OpenAI function object** with `name`, `description`, and `parameters`. Do NOT return just the parameters object — the LLM won't know what the tool is called or what it does.

```rust
// CORRECT — includes name, description, and parameters wrapper
fn schema(&self) -> Value {
    json!({
        "name": "my_tool",
        "description": "What this tool does and when to use it",
        "parameters": {
            "type": "object",
            "properties": { ... },
            "additionalProperties": false
        }
    })
}

// WRONG — missing name/description, LLM can't identify or select this tool
fn schema(&self) -> Value {
    json!({
        "type": "object",
        "properties": { ... }
    })
}
```

#### Dynamic Bots (IMPORTANT)

Bots can be added two ways: **config-based** (in `config.toml`) or **dynamic** (added via `/connect` command, stored in `dynamic_bots` SQLite table). When registering tools or features that depend on channel tokens (e.g., `ReadChannelHistoryTool` needs Slack bot_tokens), you MUST check BOTH sources:
- `config.all_slack_bots()` — config-based bots only
- `state.get_dynamic_bots().await` — dynamic bots from DB

Failing to check dynamic bots will cause features to silently not register even though the channel is connected and working.

#### Keyword Matching (IMPORTANT)

When matching keywords or phrases in user/LLM text (intent classification, deferred action detection, trigger matching, etc.), **always use word-boundary matching, never substring matching**. Substring matching causes false positives: e.g., `"deploy"` matches `"deployed"`, `"implement"` matches `"implementation"`.

Use the `contains_keyword_as_words()` helper in `agent.rs` — it splits text on whitespace, trims surrounding punctuation (preserving apostrophes for contractions), and checks for exact consecutive word sequences. Both text and keyword sides are normalized the same way, so commas/brackets in keywords are handled correctly.

```rust
// CORRECT — word-boundary matching
contains_keyword_as_words("check deployed sites", "deploy")  // false ✓
contains_keyword_as_words("deploy the app", "deploy")         // true ✓
contains_keyword_as_words("set up monitoring", "set up")      // true ✓
contains_keyword_as_words("I'll check the report", "i'll check") // true ✓

// WRONG — substring matching catches derived forms
"check deployed sites".contains("deploy")  // true ✗ — false positive
```

**Exceptions where substring matching (`.contains()`) is appropriate:**
- Structural format markers like `[tool_use:`, `[tool_call:`, `[consultation]`, `[INTENT_GATE]` — format patterns, not natural language.
- Multi-word phrases with enough specificity to avoid false positives (e.g., `"you are now"`, `"pretend to be"`, `"ignore previous instructions"`). These are safe because the phrase itself provides sufficient context. For single-word-ish patterns, add a trailing space or extra context to prevent substring overlap (e.g., `"act as a "` not `"act as"`, `"jailbreak mode"` not `"jailbreak"`).

### Module Map

- **`core.rs`** — orchestrates startup: creates state store, event store, provider, router, tools, agent, channels, dashboard. Handles the deferred wiring for `SpawnAgentTool` (circular dep: Agent ↔ SpawnAgentTool resolved via weak reference + `set_agent()`).
- **`agent/`** — modular agent system decomposed into submodules:
  - `mod.rs` (~3000 lines) — agent struct, message handling, tool registration.
  - `loop/` — phase-based agent loop: `main_loop.rs`, `llm_phase.rs`, `message_build_phase.rs`, `stopping_phase.rs`, `bootstrap/`, `tool_execution/` (budget, guards, result learning), `consultant_*` phases.
  - `runtime/` — LLM calls (`llm.rs`), history management (`history.rs`), system prompt construction (`system_prompt.rs`), graceful shutdown (`graceful.rs`), model selection (`models.rs`), post-task processing (`post_task.rs`).
  - `consultant/` — multi-pass analysis for complex queries.
  - `intent/` — intent classification and gate logic.
  - `policy/` — guardrails and safety signal detection.
- **`channels/hub.rs`** — `ChannelHub` routes messages between session IDs and channels via `SessionMap: Arc<RwLock<HashMap<String, String>>>`.
- **`state/sqlite/`** (~400KB across 20 files) — multi-layer memory: `mod.rs` (core CRUD), `facts.rs` (fact storage with semantic dedup + embeddings), `episodes.rs` (conversation summaries with channel scoping), `migrations.rs` (schema migrations), `messages.rs`, `goals.rs`, `people.rs`, `skills.rs`, `dynamic_bots.rs`, `dynamic_cli_agents.rs`, `learning.rs`, `token_usage.rs`, `tests.rs`, and more.
- **`router.rs`** — routes queries to `default_model` or `fallback_models`. Legacy Fast/Primary/Smart tiers are auto-migrated to the default+fallback model chain on startup.
- **`events/`** — event sourcing: all agent activity is immutable events. `consolidation.rs` processes events into facts/procedures daily. `context.rs` compiles session context from events.
- **`memory/`** — background consolidation (embeddings every 5s, fact extraction every 6h, decay daily). Uses `fastembed` AllMiniLML6V2 for vector embeddings. `manager.rs` handles episode creation for both idle and active long-running sessions. `comprehensive_tests.rs` has 17 test subsystems (A-M) covering canonical keys, supersession, privacy, retrieval, episodes, procedures, concurrency, and more.
- **`plans/`** — persistent multi-step task plans with detection, generation, tracking, and crash recovery.
- **`tools/terminal.rs`** — shell execution with risk assessment (`command_risk.rs`) and inline approval flow (Allow Once / Allow Always / Deny).
- **`tools/sanitize.rs`** — reply sanitization pipeline: strips model identity leaks, internal tool name references, and diagnostic/system blocks from user-facing output. Also provides `sanitize_external_content()` for input sanitization and `redact_secrets()`.
- **`tools/config_manager.rs`** — runtime provider switching with presets (OpenAI, Anthropic, Google, Moonshot, MiniMax, etc.) and config management actions.
- **`tools/memory.rs`** — `RememberFactTool` with batch fact storage (multiple facts in one call) and `ManageMemoriesTool` with fuzzy forget (canonical, case-insensitive, substring matching).
- **`providers/`** — `openai_compatible.rs` (with optional Cloudflare AI Gateway support), `google_genai.rs`, `anthropic_native.rs` — pluggable LLM backends.
- **`skills/mod.rs`** — advanced skill system with trigger-based matching, bundled resources, and dynamic management. Skills can come from filesystem (`.md` files), URLs, inline content, remote registries, or auto-promotion from successful procedures. `SharedSkillRegistry` (`Arc<RwLock<Vec<Skill>>>`) allows runtime add/remove. Each skill has metadata (name, description, triggers, source, source_url, enabled flag) and can bundle resource files (scripts, references, configs) in a directory structure. Matching is whole-word + case-insensitive with optional LLM confirmation via fast model.
- **`skills/resources.rs`** — `ResourceEntry` and `ResourceResolver` trait (`FileSystemResolver` impl) for loading bundled skill files on demand with path traversal protection and 32KB size cap.
- **`tools/use_skill.rs`** — `UseSkillTool` lets the agent activate skills on demand by name.
- **`tools/manage_skills.rs`** — `ManageSkillsTool` with 10 actions: add (from URL), add_inline, list, remove, enable, disable, browse (search registries), install (from registry), update (re-fetch from source). Includes SSRF protection.
- **`tools/skill_resources.rs`** — `SkillResourcesTool` with list/read actions for loading bundled resource files from skills.
- **`tools/skill_registry.rs`** — registry client for browsing/searching/installing skills from remote JSON manifests configured in `[skills.registries]`.
- **`memory/skill_promotion.rs`** — `SkillPromoter` background task (12h cycle) that auto-converts successful procedures (≥5 uses, ≥80% success rate) into skills via LLM generation.
- **`config.rs`** — loads `config.toml` with secret resolution: `"keychain"` → OS credential store, `"${ENV_VAR}"` → env var, or plain value.

### Concurrency Model

- Tokio async runtime throughout
- `Arc<RwLock<...>>` for shared state
- Background tasks via `tokio::spawn` (memory consolidation, event pruning, health probes, scheduler ticks)
- Channels run their own event loops (Telegram polling, Discord gateway, Slack Socket Mode)

### Feature Flags

- `browser` — `chromiumoxide` for headless Chrome
- `discord` — `serenity` for Discord bot
- `slack` — `tokio-tungstenite` for Slack Socket Mode
- `encryption` — `libsqlite3-sys/bundled-sqlcipher` for SQLCipher

### Platform-Specific

Keyring crate uses platform-native backends: `apple-native` (macOS), `sync-secret-service` (Linux), `windows-native` (Windows). These are selected via `[target.'cfg(...)'.dependencies]` in Cargo.toml.

### Testing

Tests are spread across 40+ files as `#[cfg(test)]` modules, totaling 1300+ tests. Key test areas:

- **Unit tests:** router classification, memory/embedding math, plan detection, event context, command risk patterns, skill matching, scheduler parsing, SQLite state store CRUD, provider message conversion, terminal output formatting, channel hub routing, content sanitization, markdown formatting, semantic fact dedup, episode lifecycle
- **Integration tests:** 13 test files (`part_00` through `part_11` + `scheduler_flaw`) exercising the full agent loop with mock LLM
- **Comprehensive memory tests:** `memory/comprehensive_tests.rs` — 17 subsystems (A-M) covering canonical keys, supersession chains, privacy/channel scoping, retrieval, episodes, procedures, patterns, people, decay, cleanup, concurrency
- **Property-based tests:** `proptest` for fuzz-testing command risk classification, string truncation, content sanitization, and markdown formatting
- **Dev-dependencies:** `tempfile`, `proptest`, `insta` (with `yaml` feature)

```bash
cargo test                           # run all tests
cargo test integration_tests         # run integration tests only
cargo test test_tool_execution       # run a single test by name
cargo test --lib memory              # run memory-related tests only
cargo test proptest                  # run property-based tests
```

#### CI/CD

The project uses GitHub Actions for continuous integration and release gating.

**CI pipeline** (`.github/workflows/ci.yml`) — runs on push to `master` and all PRs:
- `check` job: `cargo fmt --check` (continue-on-error) + `cargo clippy --all-features -- -D warnings`
- `test` job: `cargo test --all-features` on ubuntu-latest and macos-14
- `build-check` job: `cargo build --release --features "browser,slack,discord"`
- `coverage` job: `cargo-llvm-cov` → Codecov (continue-on-error, visibility only)

**Release gating** (`.github/workflows/release.yml`):
- `quality-gate` job runs `cargo test --all-features` before any build/release job
- All downstream jobs (build, GitHub Release, crates.io, Homebrew) are blocked if tests fail

To generate local coverage: `cargo llvm-cov --all-features --lcov --output-path lcov.info`

#### Integration Tests

Integration tests exercise the real agent loop (`Agent::handle_message`) with a mock LLM provider and temp-file SQLite DB. They verify the same code path all channels use.

```bash
cargo test integration_tests          # run integration tests only
cargo test test_tool_execution        # run a single integration test
```

**What they test:** Agent loop, tool execution, memory/state persistence, multi-turn history, session isolation, channel auth simulation, memory privacy (channel-scoped, private, global), security (sanitization, prompt injection defense), stall detection, multi-step workflows, system prompt structure.

**Test infrastructure** (`src/testing.rs`):
- **`MockProvider`** — mock `ModelProvider` with scripted responses and call logging. Use `MockProvider::new()` for default "Mock response", or `MockProvider::with_responses(vec![...])` for scripted sequences. Helpers: `text_response()`, `tool_call_response()`.
- **`TestChannel`** — mock `Channel` that captures outgoing messages. Not wired to ChannelHub — tests call `agent.handle_message()` directly.
- **`setup_test_agent(provider)`** — creates a fully wired `Agent` with real `SqliteStateStore` (temp file), real `EventStore`/`PlanStore`, real `EmbeddingService`, and `SystemInfoTool` only. Returns `TestHarness { agent, state, provider, channel }`. Each call creates an isolated DB for safe parallel execution.

First run downloads the fastembed model (~25MB, cached in `~/.cache/`).

## Debugging with db_probe

The database is encrypted with SQLCipher. To inspect it, use `src/bin/db_probe.rs` — a CLI tool that connects with the encryption key and dumps diagnostic data.

**Prerequisites:** Requires `AIDAEMON_ENCRYPTION_KEY` env var (or in `.env` file). Optionally set `AIDAEMON_DB_PATH` (defaults to `aidaemon.db`).

```bash
# Build and run (encryption feature required)
cargo run --bin db_probe --features encryption

# Search message history for a keyword (with surrounding context)
cargo run --bin db_probe --features encryption -- --search "dogs-project"
cargo run --bin db_probe --features encryption -- --search "error" --search-limit 20 --search-context 10

# Inspect a specific session's events and messages
cargo run --bin db_probe --features encryption -- --session "telegram:12345"

# Inspect a specific task's full event stream
cargo run --bin db_probe --features encryption -- --task "task-uuid-here"

# Inspect a specific CLI agent invocation
cargo run --bin db_probe --features encryption -- --invocation 42

# Repair stale CLI agent invocations (no completion recorded, older than N hours)
cargo run --bin db_probe --features encryption -- --repair-stale-cli 24

# Change token usage lookback window (default: 7 hours, max: 720)
cargo run --bin db_probe --features encryption -- --token-hours 24
```

**Default output includes:** recent CLI agent invocations, open (incomplete) invocations, token usage stats (totals, per-session, hourly), recent task events, recent `cli_agent` tool events, recent messages, and dynamic CLI agent config.

## MCP Tools

- When using chrome-devtools, prefer `take_screenshot` over `take_snapshot` to save tokens. Only use `take_snapshot` when you specifically need element UIDs for interaction (clicking, filling, etc.).