canon-mcp 0.2.0

Canon Protocol — cryptographic audit trails for AI-assisted development
canon-mcp-0.2.0 is not a library.

Canon MCP

Git tracks what changed. Canon tracks what the AI was given.

Canon Protocol is a verified knowledge layer for AI-assisted development. When an AI tool queries your codebase through Canon, every search result comes with a cryptographic proof — a signed Merkle receipt proving exactly what information was served, from what state of the knowledge base, at what point in time.

You can't see inside a black-box model. But you can prove exactly what information it was given to work with.

What's provable vs what's not

Claim Provable? How
What the user asked Yes Hook-captured prompts, signed in session proof
What Canon served the AI Yes Search proof receipts with Merkle inclusion proofs
What files the AI read/wrote Yes Hook-captured tool events with BLAKE3 file hashes
What commands the AI ran Yes Hook-captured Bash events with output
What the AI "thought" or "knew" No Model internals are opaque — no one can prove this

Canon proves what went in and what came out. The model in between is a black box.

Quick start

cargo install canon-mcp
cd your-project
canon-mcp init

canon-mcp init wires everything up:

  • .mcp.json — MCP server config (auto-detected by Claude Code, Cursor, Windsurf)
  • .canon/hooks/ — session capture hooks (prompt, tool calls, stop)
  • .claude/settings.local.json — hook wiring for Claude Code

Start your AI tool normally. Proofs accumulate in .canon/proofs/.

Build from source

git clone https://github.com/canon-ai-protocol/canon-mcp.git
cd canon-mcp
cargo build --release
cp target/release/canon-mcp /usr/local/bin/

Usage

# Run as MCP server (default — spawned automatically by Claude Code / Cursor / Windsurf)
canon-mcp --watch . --data-dir .canon

# Generate a session proof from the accumulated hook log
canon-mcp session-proof --data-dir .canon

# Verify any proof (search receipt or session proof)
canon-mcp verify .canon/proofs/<proof>.json

MCP tools

Five tools are exposed to the AI via the Model Context Protocol:

Tool What it does
canon_index Index files into the substrate — parses, chunks, embeds, stores with cryptographic commitments
canon_search Hybrid semantic + lexical search with automatic signed proof receipt generation
canon_state Return current Merkle state root, document/chunk/embedding counts, device ID
canon_proof Generate a standalone proof receipt for a specific query
canon_verify Verify any proof receipt — checks Ed25519 signature and Merkle inclusion proofs

How proofs work

Search proof receipts

When the AI calls canon_search, Canon:

  1. Embeds the query using all-MiniLM-L6-v2 (local, CPU-only)
  2. Runs hybrid vector (HNSW) + keyword (FTS5) search against the substrate
  3. Merges results with integer Reciprocal Rank Fusion for deterministic ranking
  4. Returns ranked chunks to the AI
  5. Generates a signed proof receipt and saves it to .canon/proofs/

The receipt contains:

┌─────────────────────────────────────────────────────┐
│ Search Proof Receipt v1                             │
├─────────────────────────────────────────────────────┤
│ query              "ERC20 burn function"            │
│ query_hash         BLAKE3(query)                    │
│ timestamp          2026-02-19T14:30:00Z             │
│ state_root         Merkle root of entire substrate  │
│ chunk_tree_root    Merkle root of all chunk hashes  │
│ context_hash       BLAKE3(assembled context)        │
│                                                     │
│ chunk_proofs[]     Per-chunk Merkle inclusion:      │
│   chunk_id         BLAKE3-16 content address        │
│   chunk_text_hash  BLAKE3(chunk text)               │
│   index            Position in sorted tree          │
│   siblings[]       Merkle path to root              │
│                                                     │
│ sources[]          Human-readable references:       │
│   document_path    src/CanonCoin.sol                │
│   chunk_text       "function burn(uint256 amount)"  │
│   relevance_score  0.847                            │
│                                                     │
│ git                commit, branch, author, dirty    │
│ signature          Ed25519 over all of the above    │
│ signer_public_key  Device's Ed25519 public key      │
│ device_id          BLAKE3-16(public_key)            │
└─────────────────────────────────────────────────────┘

Verification chain: Anyone with the receipt JSON can independently verify:

  1. Ed25519 signature is valid for the signing bytes
  2. Each chunk's Merkle proof resolves to the chunk_tree_root
  3. Git context is cryptographically bound (changing commit/branch breaks the signature)

This is ground truth: Canon served these results, the AI received them, the cryptographic chain is independently verifiable.

Session audit trails

Claude Code hooks capture the observable actions of the AI during each turn:

  • UserPromptSubmit hook — logs what the user asked
  • PostToolUse hook — logs every Read, Write, Edit, Bash, Grep, Glob call with input/output previews
  • Stop hook — triggers canon-mcp session-proof to seal the session

The session proof contains:

┌─────────────────────────────────────────────────────┐
│ Session Proof v3                                    │
├─────────────────────────────────────────────────────┤
│ user_prompts[]     What the user asked              │
│                                                     │
│ events[]           Chronological tool calls:        │
│   tool: Read       file_path, output_preview        │
│   tool: Write      file_path, content_preview       │
│   tool: Edit       file_path, diff_preview          │
│   tool: Bash       command, output_preview          │
│   tool: Grep       pattern, results_preview         │
│                                                     │
│ files_read[]       BLAKE3 hash + snippet per file   │
│ files_written[]    BLAKE3 hash + snippet per file   │
│ files_root         Merkle root of all file hashes   │
│                                                     │
│ git                commit, branch, author, dirty    │
│ signature          Ed25519 over all of the above    │
└─────────────────────────────────────────────────────┘

What session proofs capture vs what they don't:

Session proofs record observable actions — the files the AI opened, the commands it ran, the code it wrote. They do NOT claim to know what the model "thought" or what context it used internally. Model internals are opaque and Canon does not pretend otherwise.

For proof of what information Canon served to the AI, see the search proof receipts generated by canon_search MCP calls during the same session.

Architecture

canon-mcp/
├── crates/
│   ├── core/       canon-core
│   │               Data models, cryptographic primitives, proof structures.
│   │               Ed25519 identity, BLAKE3 hashing, Merkle trees, HLC timestamps.
│   │               Zero network dependencies — pure computation.
│   │
│   ├── store/      canon-store
│   │               SQLite graph database + HNSW vector index.
│   │               Documents, chunks, embeddings, edges, state roots.
│   │               FTS5 full-text search. usearch for approximate nearest neighbors.
│   │
│   ├── embed/      canon-embed
│   │               Local embedding generation. all-MiniLM-L6-v2 via Candle.
│   │               Pure Rust, CPU-only, no Python. 384-dimensional vectors.
│   │               i16 quantized for deterministic cross-platform results.
│   │
│   └── mcp/        canon-mcp
│                   MCP server binary. JSON-RPC over stdio.
│                   File parser, hybrid search engine, background file watcher.
│                   Session proof generation from hook logs.

Data flow

              ┌──────────┐
  Files ────> │  Parser  │──> Documents ──> Chunks ──> Embeddings
              └──────────┘        │            │            │
                                  ▼            ▼            ▼
                            ┌──────────────────────────────────┐
                            │     SQLite + HNSW (GraphStore)   │
                            │                                  │
                            │  Merkle root = H(all chunk H's)  │
                            └──────────┬───────────────────────┘
                                       │
                    ┌──────────────────┐│┌──────────────────┐
                    │  canon_search    │││  canon_proof     │
                    │  (MCP tool call) ││╎  (MCP tool call) │
                    └────────┬─────────┘│└────────┬─────────┘
                             │          │         │
                             ▼          │         ▼
                    ┌──────────────────────────────────────┐
                    │       Signed Proof Receipt            │
                    │  query + state_root + Merkle proofs   │
                    │  + Ed25519 signature + git context     │
                    └──────────────────────────────────────┘

Content addressing

All IDs are deterministic BLAKE3-16 hashes (first 16 bytes of BLAKE3), stored as UUIDs:

  • Document ID = BLAKE3-16(canonicalized content)
  • Chunk ID = BLAKE3-16(doc_id || sequence number) — stable across re-chunking
  • Embedding ID = BLAKE3-16(chunk_id || model_hash || embedding_version)
  • Device ID = BLAKE3-16(Ed25519 public key)

Same content always produces the same ID. No randomness, no platform differences.

Text canonicalization

Before hashing, all text is normalized:

  1. Line endings → \n (no \r\n or \r)
  2. Unicode NFC normalization
  3. Trailing whitespace stripped per line
  4. Single trailing newline if non-empty

This ensures byte-identical hashes across macOS, Linux, and Windows.

Hybrid search

Search combines two strategies with integer Reciprocal Rank Fusion:

  1. Semantic — query embedded via all-MiniLM-L6-v2, searched against HNSW index (cosine similarity, integer arithmetic on i16-quantized vectors)
  2. Lexical — query tokenized and searched against FTS5 with Porter stemmer

Scores fused as: Score(d) = 1,000,000 / (60 + rank_semantic) + 1,000,000 / (60 + rank_lexical)

Integer math throughout — no floating-point non-determinism. Same query + same substrate = same results on any platform.

Cryptographic primitives

Primitive Algorithm Purpose
Hashing BLAKE3 Content addressing, Merkle trees, text hashing
Signatures Ed25519 Device identity, proof signing, tamper detection
Merkle trees Binary BLAKE3 State roots, chunk inclusion proofs
Timestamps Hybrid Logical Clock Causal ordering without clock sync
Embeddings all-MiniLM-L6-v2 384-dim vectors, i16 quantized (scale 32767)
Vector index usearch HNSW Approximate nearest neighbor search
Text search SQLite FTS5 Porter-stemmed full-text search
Scoring Integer RRF Deterministic cross-platform rank fusion

Verification

# Verify a search proof receipt
canon-mcp verify .canon/proofs/2026-02-19T14-30-00Z_a1b2c3d4.json

# Verify a session proof
canon-mcp verify .canon/proofs/session_2026-02-19T14-30-00Z_a1b2c3d4.json

Verification checks:

  • Ed25519 signature — proves the receipt hasn't been tampered with
  • Merkle inclusion proofs — proves each chunk existed in the substrate at search time
  • Files Merkle root — proves session file hashes are consistent (session proofs)
  • Git context binding — proves git state was part of the signed payload

Project layout

.canon/                        Created by `canon-mcp init`
├── graph.db                   SQLite database (documents, chunks, embeddings, edges)
├── cp.key                     Ed25519 device identity seed (32 bytes)
├── canon.usearch              HNSW vector index (mmap)
├── canon.usearch.map          UUID-to-index-key mappings
├── session.jsonl              Hook event log (cleared after each session proof)
├── hooks/
│   ├── prompt.sh              UserPromptSubmit hook
│   ├── post-tool.sh           PostToolUse hook
│   └── stop.sh                Stop hook (triggers session proof)
├── proofs/                    All generated proofs
│   ├── 2026-02-19T14-30-00Z_a1b2c3d4.json       Search proof receipts
│   └── session_2026-02-19T14-30-00Z_a1b2c3d4.json  Session proofs
└── .gitignore                 Keeps substrate out of git (proofs ARE committed)

License

MIT