Canon MCP
Git tracks what changed. Canon tracks what the AI was given.
Canon Protocol is a verified knowledge layer for AI-assisted development. When an AI tool queries your codebase through Canon, every search result comes with a cryptographic proof — a signed Merkle receipt proving exactly what information was served, from what state of the knowledge base, at what point in time.
You can't see inside a black-box model. But you can prove exactly what information it was given to work with.
What's provable vs what's not
| Claim | Provable? | How |
|---|---|---|
| What the user asked | Yes | Hook-captured prompts, signed in session proof |
| What Canon served the AI | Yes | Search proof receipts with Merkle inclusion proofs |
| What files the AI read/wrote | Yes | Hook-captured tool events with BLAKE3 file hashes |
| What commands the AI ran | Yes | Hook-captured Bash events with output |
| What the AI "thought" or "knew" | No | Model internals are opaque — no one can prove this |
Canon proves what went in and what came out. The model in between is a black box.
Quick start
canon-mcp init wires everything up:
.mcp.json— MCP server config (auto-detected by Claude Code, Cursor, Windsurf).canon/hooks/— session capture hooks (prompt, tool calls, stop).claude/settings.local.json— hook wiring for Claude Code
Start your AI tool normally. Proofs accumulate in .canon/proofs/.
Build from source
Usage
# Run as MCP server (default — spawned automatically by Claude Code / Cursor / Windsurf)
# Generate a session proof from the accumulated hook log
# Verify any proof (search receipt or session proof)
MCP tools
Five tools are exposed to the AI via the Model Context Protocol:
| Tool | What it does |
|---|---|
canon_index |
Index files into the substrate — parses, chunks, embeds, stores with cryptographic commitments |
canon_search |
Hybrid semantic + lexical search with automatic signed proof receipt generation |
canon_state |
Return current Merkle state root, document/chunk/embedding counts, device ID |
canon_proof |
Generate a standalone proof receipt for a specific query |
canon_verify |
Verify any proof receipt — checks Ed25519 signature and Merkle inclusion proofs |
How proofs work
Search proof receipts
When the AI calls canon_search, Canon:
- Embeds the query using all-MiniLM-L6-v2 (local, CPU-only)
- Runs hybrid vector (HNSW) + keyword (FTS5) search against the substrate
- Merges results with integer Reciprocal Rank Fusion for deterministic ranking
- Returns ranked chunks to the AI
- Generates a signed proof receipt and saves it to
.canon/proofs/
The receipt contains:
┌─────────────────────────────────────────────────────┐
│ Search Proof Receipt v1 │
├─────────────────────────────────────────────────────┤
│ query "ERC20 burn function" │
│ query_hash BLAKE3(query) │
│ timestamp 2026-02-19T14:30:00Z │
│ state_root Merkle root of entire substrate │
│ chunk_tree_root Merkle root of all chunk hashes │
│ context_hash BLAKE3(assembled context) │
│ │
│ chunk_proofs[] Per-chunk Merkle inclusion: │
│ chunk_id BLAKE3-16 content address │
│ chunk_text_hash BLAKE3(chunk text) │
│ index Position in sorted tree │
│ siblings[] Merkle path to root │
│ │
│ sources[] Human-readable references: │
│ document_path src/CanonCoin.sol │
│ chunk_text "function burn(uint256 amount)" │
│ relevance_score 0.847 │
│ │
│ git commit, branch, author, dirty │
│ signature Ed25519 over all of the above │
│ signer_public_key Device's Ed25519 public key │
│ device_id BLAKE3-16(public_key) │
└─────────────────────────────────────────────────────┘
Verification chain: Anyone with the receipt JSON can independently verify:
- Ed25519 signature is valid for the signing bytes
- Each chunk's Merkle proof resolves to the chunk_tree_root
- Git context is cryptographically bound (changing commit/branch breaks the signature)
This is ground truth: Canon served these results, the AI received them, the cryptographic chain is independently verifiable.
Session audit trails
Claude Code hooks capture the observable actions of the AI during each turn:
- UserPromptSubmit hook — logs what the user asked
- PostToolUse hook — logs every Read, Write, Edit, Bash, Grep, Glob call with input/output previews
- Stop hook — triggers
canon-mcp session-proofto seal the session
The session proof contains:
┌─────────────────────────────────────────────────────┐
│ Session Proof v3 │
├─────────────────────────────────────────────────────┤
│ user_prompts[] What the user asked │
│ │
│ events[] Chronological tool calls: │
│ tool: Read file_path, output_preview │
│ tool: Write file_path, content_preview │
│ tool: Edit file_path, diff_preview │
│ tool: Bash command, output_preview │
│ tool: Grep pattern, results_preview │
│ │
│ files_read[] BLAKE3 hash + snippet per file │
│ files_written[] BLAKE3 hash + snippet per file │
│ files_root Merkle root of all file hashes │
│ │
│ git commit, branch, author, dirty │
│ signature Ed25519 over all of the above │
└─────────────────────────────────────────────────────┘
What session proofs capture vs what they don't:
Session proofs record observable actions — the files the AI opened, the commands it ran, the code it wrote. They do NOT claim to know what the model "thought" or what context it used internally. Model internals are opaque and Canon does not pretend otherwise.
For proof of what information Canon served to the AI, see the search proof receipts generated by canon_search MCP calls during the same session.
Architecture
canon-mcp/
├── crates/
│ ├── core/ canon-core
│ │ Data models, cryptographic primitives, proof structures.
│ │ Ed25519 identity, BLAKE3 hashing, Merkle trees, HLC timestamps.
│ │ Zero network dependencies — pure computation.
│ │
│ ├── store/ canon-store
│ │ SQLite graph database + HNSW vector index.
│ │ Documents, chunks, embeddings, edges, state roots.
│ │ FTS5 full-text search. usearch for approximate nearest neighbors.
│ │
│ ├── embed/ canon-embed
│ │ Local embedding generation. all-MiniLM-L6-v2 via Candle.
│ │ Pure Rust, CPU-only, no Python. 384-dimensional vectors.
│ │ i16 quantized for deterministic cross-platform results.
│ │
│ └── mcp/ canon-mcp
│ MCP server binary. JSON-RPC over stdio.
│ File parser, hybrid search engine, background file watcher.
│ Session proof generation from hook logs.
Data flow
┌──────────┐
Files ────> │ Parser │──> Documents ──> Chunks ──> Embeddings
└──────────┘ │ │ │
▼ ▼ ▼
┌──────────────────────────────────┐
│ SQLite + HNSW (GraphStore) │
│ │
│ Merkle root = H(all chunk H's) │
└──────────┬───────────────────────┘
│
┌──────────────────┐│┌──────────────────┐
│ canon_search │││ canon_proof │
│ (MCP tool call) ││╎ (MCP tool call) │
└────────┬─────────┘│└────────┬─────────┘
│ │ │
▼ │ ▼
┌──────────────────────────────────────┐
│ Signed Proof Receipt │
│ query + state_root + Merkle proofs │
│ + Ed25519 signature + git context │
└──────────────────────────────────────┘
Content addressing
All IDs are deterministic BLAKE3-16 hashes (first 16 bytes of BLAKE3), stored as UUIDs:
- Document ID = BLAKE3-16(canonicalized content)
- Chunk ID = BLAKE3-16(doc_id || sequence number) — stable across re-chunking
- Embedding ID = BLAKE3-16(chunk_id || model_hash || embedding_version)
- Device ID = BLAKE3-16(Ed25519 public key)
Same content always produces the same ID. No randomness, no platform differences.
Text canonicalization
Before hashing, all text is normalized:
- Line endings →
\n(no\r\nor\r) - Unicode NFC normalization
- Trailing whitespace stripped per line
- Single trailing newline if non-empty
This ensures byte-identical hashes across macOS, Linux, and Windows.
Hybrid search
Search combines two strategies with integer Reciprocal Rank Fusion:
- Semantic — query embedded via all-MiniLM-L6-v2, searched against HNSW index (cosine similarity, integer arithmetic on i16-quantized vectors)
- Lexical — query tokenized and searched against FTS5 with Porter stemmer
Scores fused as: Score(d) = 1,000,000 / (60 + rank_semantic) + 1,000,000 / (60 + rank_lexical)
Integer math throughout — no floating-point non-determinism. Same query + same substrate = same results on any platform.
Cryptographic primitives
| Primitive | Algorithm | Purpose |
|---|---|---|
| Hashing | BLAKE3 | Content addressing, Merkle trees, text hashing |
| Signatures | Ed25519 | Device identity, proof signing, tamper detection |
| Merkle trees | Binary BLAKE3 | State roots, chunk inclusion proofs |
| Timestamps | Hybrid Logical Clock | Causal ordering without clock sync |
| Embeddings | all-MiniLM-L6-v2 | 384-dim vectors, i16 quantized (scale 32767) |
| Vector index | usearch HNSW | Approximate nearest neighbor search |
| Text search | SQLite FTS5 | Porter-stemmed full-text search |
| Scoring | Integer RRF | Deterministic cross-platform rank fusion |
Verification
# Verify a search proof receipt
# Verify a session proof
Verification checks:
- Ed25519 signature — proves the receipt hasn't been tampered with
- Merkle inclusion proofs — proves each chunk existed in the substrate at search time
- Files Merkle root — proves session file hashes are consistent (session proofs)
- Git context binding — proves git state was part of the signed payload
Project layout
.canon/ Created by `canon-mcp init`
├── graph.db SQLite database (documents, chunks, embeddings, edges)
├── cp.key Ed25519 device identity seed (32 bytes)
├── canon.usearch HNSW vector index (mmap)
├── canon.usearch.map UUID-to-index-key mappings
├── session.jsonl Hook event log (cleared after each session proof)
├── hooks/
│ ├── prompt.sh UserPromptSubmit hook
│ ├── post-tool.sh PostToolUse hook
│ └── stop.sh Stop hook (triggers session proof)
├── proofs/ All generated proofs
│ ├── 2026-02-19T14-30-00Z_a1b2c3d4.json Search proof receipts
│ └── session_2026-02-19T14-30-00Z_a1b2c3d4.json Session proofs
└── .gitignore Keeps substrate out of git (proofs ARE committed)
License
MIT