# Moonshot Plan: Maximal Agent Power
## Context
cqs has 35 CLI commands, 15 in batch mode, pipeline syntax, token budgeting, and a reference system. Agents currently call 3-8 cqs commands per task and manually assemble understanding. The moonshot: **one query, complete implementation context, zero iteration**.
---
## Where We Are
**35 commands.** 15 available in batch mode. Pipeline syntax chains them. Token budgeting on 7 commands. Scout, gather, and impact exist independently but share nothing — each loads its own call graph, test chunks, and staleness data. Where is search-only (no call graph).
**Parser extracts:** Functions, methods, classes, structs, enums, traits, interfaces, constants (+ markdown sections). Call sites (callee name + line). Signatures as raw strings. Type references: parameter types, return types, field types, trait bounds, generic parameters via tree-sitter queries across 7 languages (Rust, Python, TypeScript, Go, Java, C, SQL).
**Schema v11:** `chunks`, `calls`, `function_calls`, `notes`, `type_edges` tables. Type-level dependency tracking via `type_edges`: source_chunk_id → target_type_name with edge_kind classification (Param, Return, Field, Impl, Bound, Alias). `cqs deps` command for forward/reverse queries.
**Search pipeline:** FTS5 keyword → semantic embedding → RRF fusion → HNSW acceleration. Notes boost code rankings via sentiment-based score multipliers but are not returned as search results.
**Token budgeting:** Flat greedy knapsack by score. No information-type priority. Exception: `explain` hardcodes target-first, similar-second.
**Embeddings:** 768-dim (E5-base-v2), stored as BLOB in SQLite. `full_cosine_similarity()` exists for cross-store comparison. Reference system stores separate Store+HNSW per reference.
---
## Phase Ordering Principle
**Foundation first.** Things that change parser, schema, store, or search scoring come before things that wire existing internals into commands. Foundation layers are hard to redo later — if you skip them, future phases need reindexing, schema migrations, or architectural rework. Wiring is easy anytime.
---
## Phase 1: Foundation
*Schema, parser, search — hard to redo later.*
### 1a. Type extraction + schema v11 + `cqs deps` — DONE
Parser type extraction across 7 languages (PR #440). Schema v11 with `type_edges` table, store methods, `cqs deps` CLI + batch mode (PR #442).
**What shipped:**
- `parse_file_relationships()` returns `(Vec<FunctionCalls>, Vec<ChunkTypeRefs>)` from single tree-sitter parse
- `type_edges` table: source_chunk_id → target_type_name with edge_kind + line_number, FK CASCADE to chunks
- Store: `upsert_type_edges`, `get_type_users`, `get_types_used_by`, batch variants, stats, graph, shared_type_users, prune
- CLI: `cqs deps <type>` (forward) / `cqs deps --reverse <function>` (reverse)
- Batch: `deps` command, pipeable (forward mode)
- Stats: `type_graph` section in `cqs stats`
### 1b. Type integration into existing commands
Wire type_edges into commands that currently only walk call edges.
- `cqs related` — upgrade `shared_types` from signature LIKE-matching (`search_chunks_by_signatures_batch`) to `type_edges` traversal. Far more accurate.
- `cqs impact` — `--include-types` flag: also trace type-level edges. BFS currently only walks call edges; needs unified graph traversal over call + type edges, or two-pass approach. Non-trivial change to `bfs.rs`.
- `cqs dead` — type-reference awareness: struct with 0 type_edges + 0 callers = more confidently dead.
- Unified BFS over call + type graph for downstream consumers.
**Estimated scope:** ~300-400 lines across `related.rs`, `impact/bfs.rs`, `dead` logic.
### 1c. Note-boosted search ranking — DONE
Notes now influence code search ranking. In `search_filtered()` and `search_by_candidate_ids()`, after scoring each chunk, notes whose mentions match the chunk's file path or name apply a multiplicative boost: `adjusted_score = base_score * (1.0 + sentiment * 0.15)`. Multiple matching notes: strongest absolute sentiment wins (preserving sign).
**What shipped:**
- `note_boost()` helper in `src/search.rs` — computes per-chunk boost from note mentions
- Wired into both brute-force (`search_filtered`) and HNSW-guided (`search_by_candidate_ids`) paths
- Notes loaded once per search via `list_notes_summaries()` (cheap, no embeddings)
- 7 unit tests for boost logic (no match, file match, name match, strongest-wins)
- ~40 lines added. No schema change.
### 1d. Embedding model evaluation
Benchmark E5-base-v2 against CodeSage, UniXcoder, Nomic Code on the existing eval harness. Quantify the retrieval quality gap. If another model is significantly better on code, upgrade — this changes embedding dim and requires full reindex.
This is research, not code. Output: evaluation report with precision@K metrics.
---
## Phase 2: Features
*New commands + batch wiring. All use Phase 1 foundations.*
### 2a. Batch completeness
*Close the gaps that force agents out of batch mode.*
#### What's missing and why
| `scout` | Nothing — BatchContext already has Store + Embedder + root | Add `BatchCmd::Scout` variant + dispatch handler |
| `where` | Nothing — same as scout | Add `BatchCmd::Where` variant + dispatch handler |
| `notes list` | CLI reads `docs/notes.toml` directly, not Store | Use `store.list_notes_summaries()` instead — consistent, already indexed. **Caveat:** Store data may lag if notes.toml edited but not re-indexed; add mtime freshness check. |
| `read` | Needs filesystem access + audit mode + notes injection | Add audit mode flag to BatchContext. File read via `root` for path resolution. Cache parsed notes. |
| `stale` | Needs `enumerate_files()` which takes `&Parser` | `enumerate_files()` only uses `parser.supported_extensions()` → `REGISTRY.supported_extensions()`. Refactor to accept extensions slice instead of Parser. Add lazy file set to BatchContext. |
| `health` | Same as stale — needs file set for staleness check | Same fix: lazy `OnceLock<HashSet<PathBuf>>` for file set. Shares refactored `enumerate_files()`. |
#### Architecture changes
```
BatchContext {
store: Store, // existing
embedder: OnceLock<Embedder>, // existing
hnsw: OnceLock<Option<Box<dyn VectorIndex>>>, // existing
refs: RefCell<HashMap<String, ReferenceIndex>>, // existing
root: PathBuf, // existing
cqs_dir: PathBuf, // existing
+ file_set: OnceLock<HashSet<PathBuf>>, // new: lazy, for stale/health
+ audit_mode: OnceLock<bool>, // new: lazy, check .audit-mode file once
+ notes_cache: OnceLock<Vec<NoteEntry>>, // new: lazy, parse docs/notes.toml once
}
```
#### New batch commands
| Command | Pipeable? | Output format |
|---------|-----------|---------------|
| `scout <query>` | Yes — outputs function names in file_groups.chunks | `ScoutResult` JSON |
| `where <description>` | No — outputs file paths, not function names | `PlacementResult` JSON |
| `read <path> [--focus <fn>]` | No — outputs file content | `{file, content, notes_injected}` |
| `stale` | No | `StaleReport` JSON |
| `health` | No | `HealthReport` JSON |
| `notes [--warnings] [--patterns]` | No | `[{text, sentiment, mentions}]` |
Pipeline additions: `scout` added to PIPEABLE_COMMANDS. `extract_names()` must walk nested `file_groups[].chunks[].name` (current implementation only checks top-level array fields — needs recursive or special-case handling). Only `modify_target` role chunks should be extracted for piping (not dependencies or test-to-update).
Note: `enumerate_files()` currently requires `&Parser` but only calls `parser.supported_extensions()` which delegates to `REGISTRY.supported_extensions()`. Refactor signature to accept `&[&str]` extensions slice — decouples from Parser, makes BatchContext lightweight.
**Estimated scope:** ~250 lines in `batch.rs` (6 new dispatch handlers + 3 new BatchContext fields). No new files, no new dependencies. ~10 integration tests.
### 2b. `cqs onboard "concept"` — guided codebase tour
**What it does:** Given a concept, produces an ordered reading list: entry point → call chain → key types → tests. One command replaces 10 minutes of manual exploration.
**Architecture:**
1. `scout(query)` for initial relevant code (reuse existing)
2. Pick highest-scored modify_target as entry point
3. BFS expansion from entry point via `bfs_expand()` + `fetch_and_assemble()` (NOT `gather()` — gather re-searches, we already have the entry point)
4. For each gathered chunk, fetch test mapping via `find_affected_tests()`
5. Order: entry point → callees (depth-first) → callers → tests
6. Token-budget the ordered list
**Reuses:** scout, gather's internal `bfs_expand()` + `fetch_and_assemble()`, impact/test-finding, token_pack. New: ordering logic + OnboardResult type.
**Prerequisite:** `bfs_expand()` and `fetch_and_assemble()` in `gather.rs` are currently private (`fn`, not `pub fn`). Must be made `pub(crate)` for reuse by onboard and later by `cqs task` (Phase 3).
**New files:** `src/onboard.rs` (~150 lines), `src/cli/commands/onboard.rs` (~80 lines).
### 2c. Auto-stale notes
Detect when notes reference deleted/renamed functions. `notes list` gains a `--check` flag that verifies each mention still exists in the index via `store.search_by_names_batch()`. Stale mentions flagged in output.
`cqs suggest` extended to detect stale notes as a pattern category.
~100 lines.
### 2d. `cqs drift` — semantic change detection
**What it does:** Compare embeddings of same-named functions across two snapshots. Surface functions where embedding distance exceeds threshold.
**Algorithm:**
1. For each function in current index, find matching name in reference index
2. Retrieve embeddings from both stores via `get_chunk_with_embedding()`
3. Compute `full_cosine_similarity()` (already exists in `src/math.rs`)
4. If similarity < threshold (default 0.95), flag as drifted
5. Sort by drift magnitude (most changed first)
**Output:** `{drifted: [{name, file, similarity, delta}], threshold, reference}`
**Prerequisite:** Reference must be a snapshot of the same codebase at an earlier point (`cqs ref add v1.0 .`), not an external library. Drift compares same-named functions across snapshots — external references have different function sets.
~200 lines. New file `src/drift.rs`, CLI in `src/cli/commands/drift.rs`.
### 2e. `cqs patterns` — convention extraction
Analyze codebase for recurring patterns: error handling style, naming conventions, import patterns, test organization. Uses `where_to_add.rs`'s `LocalPatterns` extraction across all indexed files instead of just search results.
~300 lines. Deferred to last in phase — largest effort, least agent impact.
---
## Phase 3: Moonshot
*Orchestrates everything — benefits from all prior phases.*
### `cqs task "description"` — single-call implementation brief
**What it does:** Given a task description, returns everything an agent needs: relevant code, impact analysis, placement suggestions, test requirements, risk assessment, relevant notes — in one token-budgeted response.
**Architecture:**
```
cqs task "add authentication middleware" --tokens 8000 --json
┌─────────────────────────────────────────────┐
│ 1. Shared resource loading (once) │
│ Store, Embedder, embed_query(task), │
│ get_call_graph(), find_test_chunks() │
├─────────────────────────────────────────────┤
│ 2. Scout phase — relevant code + metadata │
│ scout_with_graph() — pre-loaded graph │
│ Output: ScoutResult │
├─────────────────────────────────────────────┤
│ 3. Gather phase — BFS from modify targets │
│ bfs_expand() + fetch_and_assemble() │
│ Seeded from scout, not fresh search │
│ Output: GatherResult │
├─────────────────────────────────────────────┤
│ 4. Impact phase — what breaks │
│ analyze_impact_with_graph() for targets │
│ Output: Vec<ImpactResult> │
├─────────────────────────────────────────────┤
│ 5. Where phase — placement suggestion │
│ suggest_placement() │
│ Output: PlacementResult │
├─────────────────────────────────────────────┤
│ 6. Test-map phase — tests to run/write │
│ find_affected_tests() with pre-loaded │
│ Output: test names + files │
├─────────────────────────────────────────────┤
│ 7. Adaptive token budgeting │
│ Per-section budget with waterfall │
│ Output: unified JSON with token_count │
└─────────────────────────────────────────────┘
```
**Key insight:** Today an agent calling scout + gather + impact separately loads the call graph 3 times and test chunks 2 times. Adding where adds another embedding query. `cqs task` loads everything once.
**What exists vs what's new:**
| Semantic search | Yes | `search_filtered()` | None |
| Scout grouping | Yes | `scout()` | Need `scout_with_graph()` variant accepting pre-loaded `&CallGraph` + `&[ChunkSummary]` |
| Gather BFS | Yes | `bfs_expand()` + `fetch_and_assemble()` | Seed from scout targets, not fresh search. **Both are currently private** in `gather.rs` — make `pub(crate)` (same prereq as Phase 2b). |
| Impact analysis | Yes | `analyze_impact()` | Need `analyze_impact_with_graph()` — `compute_hints_with_graph()` already accepts pre-loaded graph |
| Where-to-add | Yes | `suggest_placement()` | None — already standalone |
| Test mapping | Yes | `find_affected_tests()` | Accepts pre-loaded `&CallGraph` but still loads test_chunks internally. Need variant accepting pre-loaded test chunks, or cache in orchestrator. |
| Notes lookup | Yes | `find_relevant_notes()` | None |
| Token packing | Yes | `token_pack()` | None — already generic |
| Adaptive budgeting | **No** | — | New: section-aware budget allocator |
| Unified output | **No** | — | New: `TaskResult` combining all sections |
| CLI command | **No** | — | New: `cmd_task()` |
**Adaptive token budgeting:**
```rust
struct BudgetAllocation {
scout_pct: f32, // 15% — overview/metadata (cheap)
gather_pct: f32, // 50% — code content (most tokens)
impact_pct: f32, // 15% — callers/tests
where_pct: f32, // 10% — placement suggestions
notes_pct: f32, // 10% — relevant notes
}
```
Algorithm: compute per-section budget → pack each section independently with `token_pack()` → redistribute unused budget to next section (waterfall) → always include at least 1 item from each non-empty section.
**Estimated scope:** ~500-650 lines new code:
- `src/task.rs` (~200-250 lines) — orchestrator, TaskResult, BudgetAllocation, waterfall redistribution, error handling
- `src/cli/commands/task.rs` (~100 lines) — CLI wiring, JSON output
- Refactoring for shared resources (~80 lines):
- `gather.rs`: make `bfs_expand()` + `fetch_and_assemble()` pub(crate) (already needed by Phase 2b)
- `scout.rs`: add `scout_with_graph()` accepting pre-loaded `&CallGraph` + `&[ChunkSummary]`
- `impact/analysis.rs`: add `find_affected_tests_with_chunks()` accepting pre-loaded test chunks
- Tests (~100-150 lines)
- CLI registration, lib.rs re-exports, docs
### `cqs verify <diff>` — post-implementation validation
Given a diff, check against the index:
- Did the change update all affected callers?
- Are there missing test updates for changed functions?
- Does the new code follow local conventions (via where_to_add patterns)?
- Any new dead code introduced?
Builds on `impact-diff`, `dead`, and `where_to_add` pattern extraction. ~300 lines.
---
## Phase 4: Reach
*Ongoing. Breadth after depth.*
| C# language support | tree-sitter-c-sharp | Biggest missing language by market |
| Pre-built release binaries | GitHub Actions CI/CD | Users shouldn't compile from source |
| `cqs ref install <name>` | Pre-built reference packages | One command to add tokio/express/django |
---
## Dependency Graph
```
Phase 1a (Type extraction + schema + deps CLI) — DONE
└── foundation for 1b, enriches Phase 3
Phase 1b (Type integration)
└── depends on 1a (needs type_edges table)
Phase 1c (Note-boosted ranking)
└── independent — changes search scoring
└── improves Phase 3 scout seeds
Phase 1d (Embedding eval)
└── independent — pure research
└── if model changes, full reindex required
Phase 2a (Batch completeness)
└── no dependencies, uses existing commands
Phase 2b (Onboard)
└── no new dependencies, uses existing scout + gather + impact
└── prereq: make gather internals pub(crate) (also needed by Phase 3)
Phase 2c (Auto-stale notes)
└── independent
Phase 2d (Drift)
└── independent — uses existing reference system + cosine similarity
Phase 2e (Patterns)
└── independent — uses existing LocalPatterns
Phase 3 (Task + Verify)
└── benefits from Phase 1b (type deps enrich impact analysis)
└── benefits from Phase 1c (note-boosted ranking improves scout seeds)
└── benefits from Phase 2a (batch completeness for testing)
└── can start without them — just better with them
```
---
## Progress Tracker
| 1a | Type extraction + schema v11 + deps | **Done** (PRs #440, #442) | 3 |
| 1b | Type integration (related, impact, dead) | **Done** (PR #447) | 1 |
| 1c | Note-boosted search ranking | **Done** | 0.5 |
| 1d | Embedding model eval | **Done** (E5-base-v2 confirmed: 90.9% R@1, 0.941 MRR) | 1 |
| 2a | Batch completeness (scout, where, read, stale, health, notes, --tokens) | **Done** (PRs #463, #467) | 2 |
| 2b | `cqs onboard` | **Done** (PR #457) | 1 |
| 2c | Auto-stale notes | **Done** | 0.5 |
| 2d | `cqs drift` | **Done** | 0.5 |
| 2e | `cqs patterns` | Parked | — |
| 3 | `cqs task` | **Done** | 1 |
| 3 | `cqs verify` | Not started | 1-2 |
| 4 | C#, binaries, ref install | Not started | ongoing |
---
## What This Means for Agents
| Today | 3-8 | ~40% of window on exploration |
| After Phase 1 | Type + call graph = complete structural understanding | Notes shape search ranking |
| After Phase 2 | Batch covers all workflows, onboard eliminates ramp-up | Pipeline chains cover most workflows |
| After Phase 3 | **1 call = complete implementation brief** | ~90% of window on actual work |
---
*Architecture details derived from: `src/cli/batch.rs` (BatchContext, dispatch, pipeline), `src/store/` (Store, schema v11, search pipeline), `src/parser/` (chunk extraction, call extraction, type extraction), `src/scout.rs`, `src/gather.rs`, `src/impact/`, `src/where_to_add.rs`, `src/search.rs`, `src/note.rs`, `src/math.rs`, `src/embedder.rs`.*
*Created 2026-02-14. Reorganized 2026-02-15. See `ROADMAP.md` for current sprint work.*