use std::path::Path;
use anyhow::{Context, Result};
const MARKER: &str = "<!-- mati:vector-c -->";
const END_MARKER: &str = "<!-- /mati:vector-c -->";
const VECTOR_C_BODY: &str = "\
## mati context store
This project uses mati. Before reading any file, call mem_get(\"file:<path>\").
High-confidence records (confidence >= 0.6, confirmed=true) replace file reads.
The PreToolUse hook enforces this at the environment level.
Run `mati status` to see current knowledge health.
## mati Knowledge Capture
When the developer says any of these:
- \"add that as a gotcha\" / \"that's a gotcha\" / \"remember this\"
- \"note that down\" / \"mati note: ...\" / \"we decided to...\"
Call mem_set immediately. Do not ask for confirmation.
Single gotcha from developer request: mem_set then `mati gotcha confirm <key>`.
Batch /mati-enrich directory: leave unconfirmed, remind to run `mati review`.
## /mati-enrich
Run /mati-enrich [path] to enrich a file or directory.
Before enriching each file, call mem_get(\"file:<path>\"). If the record has
source \"claude_enrich\" or \"developer_manual\" and confidence >= 0.60, skip it —
already enriched. Only re-enrich if the user explicitly passes the file path.
Per-file flow: mem_get → Read file → extract purpose + gotchas → mem_set file → mem_set each gotcha.
Single file: mem_set + `mati gotcha confirm <key>` for each gotcha.
Directory/batch: mem_set only (confirmed=false).
When enrichment is complete, print a summary:
Enriched: X files (Y skipped — already enriched)
Gotcha candidates extracted: Z
Run `mati review` to confirm candidates and activate hook enforcement.
Run `mati stats` to see updated coverage and onboarding score.
## /mati-enrich — extraction pipeline (v0.2)
The four-stage pipeline below is the operational instruction set for
extracting gotcha candidates. It SUPERSEDES the brief overview above
for the actual extraction steps; the intro stays as the high-level
intent. Apply all four stages per file.
### Stage 1 — Setup (before reading)
1. `mem_query mode=\"text\" query=\"<dirname-of-file>\" limit 5`
→ top 5 confirmed gotchas as POSITIVE EXEMPLARS. If zero exist
(cold start), continue with schema-only guidance.
2. `mem_get(\"file:<path>\")` — mints the consultation receipt, returns
existing gotcha_keys, AND returns the `enrichment_depth_hint` field
(D2-α: one of \"fast\", \"standard\", \"deep\"). Use it to pick the
tier branch below. If absent (older daemon), default to \"deep\".
3. **Deep tier only**: call via Bash
`mati ls tombstoned --dir <dirname-of-file> --recent 30d --json`
to retrieve NEGATIVE EXEMPLARS — rules that were proposed for
this directory and then tombstoned. Use them in Stage 2 to
calibrate AGAINST proposing similar rules. If `count` is 0,
skip the negative block. Record whether the block was actually
used — controls the `with-neg-exemplars` tag in Stage 4.
4. **SOTA path** (replaces the LLM file scan — preferred): call
`mati extract-signals --file <path>` via Bash for deterministic,
AST-aware signal extraction across all 12 supported languages.
Returns JSON
`{ file, language, signal_count, signals: [{ file_line, tier,
kind, evidence }, ...] }`. If `signal_count > 0`, use these
as the candidate list and SKIP the manual file scan in Stage 2;
tag mem_set with `signal-source:ast` (Stage 4 step 4).
Otherwise fall back to the legacy LLM file scan and tag
`signal-source:llm`.
### Tier branches (D2)
| Tier | Stage 2 | Stage 3 critique | Negative exemplars |
| --------- | ----------- | ---------------- | ------------------ |
| fast | schema only | skip | no |
| standard | positive | Round 1 + 2 | no |
| deep | positive | Rounds 1, 2, 3 | yes |
`fast` for trivial files (LoC < 100, isolated blast, no cluster).
`standard` is the default. `deep` runs the full pipeline including
negative exemplars for hotspot / signal-rich files.
### Stage 2 — Enumeration (maximize recall)
Read the file. Output a JSON array of candidates, using the POSITIVE
EXEMPLARS as calibration for this project's specific bar.
Signal ranking (extract from highest first):
HIGH: WARNING / FIXME / HACK / SAFETY / IMPORTANT comments;
panic!/assert!/expect(\"…\") with non-trivial messages;
comments explaining \"why this looks weird\" or \"do not\".
MEDIUM: Defensive guards (early returns, custom error paths);
non-obvious literal arguments (e.g. with_versioning(true, 0));
error handling that diverges from the rest of the file.
LOW: Raw API usage with no comment context.
Schema (strict JSON, one element per candidate):
[
{ \"candidate_id\": \"C1\",
\"signal_tier\": \"high\" | \"medium\" | \"low\",
\"file_line\": \"L42\",
\"evidence_quote\": \"exact text from file at that line\",
\"draft_rule\": \"imperative verb + specific target\",
\"draft_reason\": \"what breaks and why\",
\"draft_severity\": \"critical\" | \"high\" | \"normal\" | \"low\" } ]
Goal: maximize recall. Weak candidates are OK — filtered next.
### Stage 3 — Critique loop (bounded, 3 rounds)
ROUND 1 — Specificity. Discard candidates failing ANY of:
Specific — names a concrete API, value, or pattern
(NOT \"be careful\", \"review carefully\", \"complex code\")
Enforceable — could a hook deny a real mistake based on this rule?
Non-obvious — would a reviewer learn something not derivable from
type signatures alone?
Causal — does the reason state WHAT breaks with \"because\"/\"since\"?
ROUND 2 — Cross-reference verification (DETERMINISTIC, D-α).
For each Round 1 survivor, call `mati verify-evidence` via Bash:
mati verify-evidence \\
--file <path> \\
--line <candidate.file_line> \\
--quote \"<candidate.evidence_quote>\" \\
--pattern \"<api/literal named in candidate.draft_rule>\"
The CLI returns JSON. Parse it:
{ \"verified\": true, ... } → keep, add \"verified\": true
{ \"verified\": false, ... } → DISCARD (hallucinated citation, or
rule generalizes beyond visible scope)
Do NOT trust self-critique here. The CLI is the source of truth.
ROUND 3 — Stability check. If Round 2 == Round 1, proceed. If
Round 2 discarded items, re-run Round 2 on the new survivor set
(cascading discards). Cap at 3 iterations total.
### Stage 4 — Refinement and write
For each verified candidate:
1. Tighten rule: imperative verb first; concrete names not pronouns;
≤ 80 chars where possible.
2. Verify reason uses \"because\"/\"since\"/\"as\" — add if missing.
3. Assign severity via HYBRID CLASSIFIER (D-β). Two passes:
3a. KEYWORD pass (deterministic):
contains \"panic\" / \"data loss\" / \"corruption\" / \"security\"
→ critical
contains \"regress\" / \"wrong result\" / \"silent failure\" / \"race\" /
\"silently\" / \"lose\" / \"lost\" / \"unbounded\" / \"indefinite\"
→ high
contains \"performance\" / \"warning\" / \"deprecation\" / \"slow\" /
\"lock\" / \"exclusive\" / \"contention\" / \"stale state\" /
\"false positive\" / \"inconsistent\"
→ normal
else
→ low
3b. SEMANTIC pass (LLM judgment) using rubric:
critical — data loss, corruption, security, unbounded growth
high — wrong result, silent failure, race, broken invariant
normal — performance, workflow blocker, non-obvious cleanup
low — informational, stylistic, minor inconvenience
3c. If 3a and 3b agree → use that severity.
If they disagree → use the HIGHER + add tag \"severity-disputed\".
4. Call `mem_set`:
key: `gotcha:<slug>`
rule, reason, severity (from step 3)
affected_files: [<path>]
tags: [\"enriched\", \"depth:<tier>\"]
+ [\"signal-source:ast\"] (if Stage 1 step 4 used extract-signals)
else [\"signal-source:llm\"]
+ [\"with-neg-exemplars\"] (if Stage 1 step 3 used negatives)
+ ([\"severity-disputed\"] if step 3c flagged)
confirmed: false
The `depth:<tier>` tag (D3) drives per-tier accuracy in
`mati doctor`. The `signal-source:*` and `with-neg-exemplars`
tags (SOTA-γ) drive per-config A/B so reviewers can prove the
SOTA pipeline outperforms the legacy LLM scan.
### Notes
- Per-file token budget: ~8K tokens for Stages 2-3 combined. If you
exceed, truncate Stage 2 candidates to top 10 by signal_tier.
- The Rust-side quality gate still applies at write time. The
pipeline maximizes what gets through; the gate enforces the floor.
";
fn vector_c_stub() -> String {
format!("{MARKER}\n{VECTOR_C_BODY}\n{END_MARKER}\n")
}
pub fn write_claude_md_stub(project_root: &Path) -> Result<WriteResult> {
let claude_dir = project_root.join(".claude");
if !claude_dir.is_dir() {
return Ok(WriteResult::NoClaude);
}
let path = claude_dir.join("CLAUDE.md");
let stub = vector_c_stub();
if path.exists() {
let content = std::fs::read_to_string(&path)
.with_context(|| format!("failed to read {}", path.display()))?;
if let Some(start) = content.find(MARKER) {
let updated = if let Some(end_rel) = content[start..].find(END_MARKER) {
let end = start + end_rel + END_MARKER.len();
let mut next = String::with_capacity(content.len() + stub.len());
next.push_str(&content[..start]);
next.push_str(&stub);
if content[end..].starts_with('\n') {
next.push_str(&content[end + 1..]);
} else {
next.push_str(&content[end..]);
}
next
} else {
let mut next = String::with_capacity(content.len() + stub.len());
next.push_str(&content[..start]);
next.push_str(&stub);
next
};
if updated == content {
return Ok(WriteResult::AlreadyPresent);
}
std::fs::write(&path, updated)
.with_context(|| format!("failed to write {}", path.display()))?;
return Ok(WriteResult::Updated);
}
let mut appended = content;
if !appended.ends_with('\n') {
appended.push('\n');
}
appended.push('\n');
appended.push_str(&stub);
std::fs::write(&path, appended)
.with_context(|| format!("failed to write {}", path.display()))?;
Ok(WriteResult::Appended)
} else {
std::fs::write(&path, &stub)
.with_context(|| format!("failed to write {}", path.display()))?;
Ok(WriteResult::Created)
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum WriteResult {
Created,
Appended,
Updated,
AlreadyPresent,
NoClaude,
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::TempDir;
#[test]
fn creates_file_when_claude_dir_exists() {
let dir = TempDir::new().unwrap();
std::fs::create_dir_all(dir.path().join(".claude")).unwrap();
let result = write_claude_md_stub(dir.path()).unwrap();
assert_eq!(result, WriteResult::Created);
let content = std::fs::read_to_string(dir.path().join(".claude/CLAUDE.md")).unwrap();
assert!(content.contains(MARKER));
assert!(content.contains(END_MARKER));
assert!(content.contains("mem_get(\"file:<path>\")"));
assert!(content.contains("PreToolUse hook enforces this"));
}
#[test]
fn skips_when_no_claude_dir() {
let dir = TempDir::new().unwrap();
assert!(!dir.path().join(".claude").exists());
let result = write_claude_md_stub(dir.path()).unwrap();
assert_eq!(result, WriteResult::NoClaude);
assert!(!dir.path().join(".claude").exists());
}
#[test]
fn appends_to_existing_file_without_marker() {
let dir = TempDir::new().unwrap();
let claude_dir = dir.path().join(".claude");
std::fs::create_dir_all(&claude_dir).unwrap();
let existing = "# My Project\n\nExisting instructions.\n";
std::fs::write(claude_dir.join("CLAUDE.md"), existing).unwrap();
let result = write_claude_md_stub(dir.path()).unwrap();
assert_eq!(result, WriteResult::Appended);
let content = std::fs::read_to_string(claude_dir.join("CLAUDE.md")).unwrap();
assert!(content.starts_with("# My Project"));
assert!(content.contains(MARKER));
assert!(content.contains("Existing instructions."));
}
#[test]
fn idempotent_on_rerun() {
let dir = TempDir::new().unwrap();
std::fs::create_dir_all(dir.path().join(".claude")).unwrap();
let first = write_claude_md_stub(dir.path()).unwrap();
assert_eq!(first, WriteResult::Created);
let second = write_claude_md_stub(dir.path()).unwrap();
assert_eq!(second, WriteResult::AlreadyPresent);
let content = std::fs::read_to_string(dir.path().join(".claude/CLAUDE.md")).unwrap();
let marker_count = content.matches(MARKER).count();
assert_eq!(marker_count, 1);
}
#[test]
fn appended_stub_has_blank_line_separator() {
let dir = TempDir::new().unwrap();
let claude_dir = dir.path().join(".claude");
std::fs::create_dir_all(&claude_dir).unwrap();
std::fs::write(claude_dir.join("CLAUDE.md"), "# Title\n").unwrap();
write_claude_md_stub(dir.path()).unwrap();
let content = std::fs::read_to_string(claude_dir.join("CLAUDE.md")).unwrap();
assert!(content.contains("# Title\n\n<!-- mati:vector-c -->"));
}
#[test]
fn updates_existing_legacy_stub_block() {
let dir = TempDir::new().unwrap();
let claude_dir = dir.path().join(".claude");
std::fs::create_dir_all(&claude_dir).unwrap();
let legacy = format!("{MARKER}\n## old mati block\nstale instructions\n");
std::fs::write(claude_dir.join("CLAUDE.md"), legacy).unwrap();
let result = write_claude_md_stub(dir.path()).unwrap();
assert_eq!(result, WriteResult::Updated);
let content = std::fs::read_to_string(claude_dir.join("CLAUDE.md")).unwrap();
assert!(content.contains("## mati context store"));
assert!(content.contains(END_MARKER));
assert!(!content.contains("## old mati block"));
}
}