mcpkill 0.1.0

Universal MCP proxy — semantic cache + chunking to kill token waste
Documentation

mcpkill

Your MCP servers are dumping thousands of tokens into Claude's context.mcpkill intercepts every response and returns only what's relevant.

CI crates.io MIT Rust


The problem

When Claude calls an MCP tool — read_file, search_docs, list_directory — the server returns the entire result. If you ask "how does the auth middleware work?" and the file is 600 lines, Claude gets 600 lines. Every time. Whether it asked the same question before or not.

That's slow, expensive, and fills the context window with noise.

What mcpkill does

It wraps any MCP server and sits invisibly between Claude and the server:

Claude  →  mcpkill  →  MCP server
              ↑
    intercepts every tools/call response
    chunks it, embeds it, caches it
    returns only the relevant parts

Claude still talks to the same MCP server. Nothing changes from its perspective. The difference is what it gets back.

Numbers from a real session

Five questions about a Rust codebase, using server-filesystem:

                        Raw       Filtered   Saved
  ─────────────────────────────────────────────────
  read_file proxy.rs    1 637t     390t      − 76%
  read_file filter.rs   1 504t     374t      − 75%
  read_file cache.rs    2 763t     348t      − 87%
  read_file proxy.rs *    same      390t      − 76%   ← cache hit, 0ms lookup
  read_file filter.rs *   same      374t      − 75%   ← cache hit, 0ms lookup
  ─────────────────────────────────────────────────
  Total                 9 045t   1 876t      − 79%

  * similar question on the same file, served from cache

80% cache hit rate from the first session.

Install

curl -fsSL https://raw.githubusercontent.com/rustkit-ai/mcpkill/main/install.sh | sh

Or via Homebrew:

brew tap rustkit-ai/tap && brew install mcpkill

Or via Cargo:

cargo install mcpkill

Add it to any MCP server — one command

mcpkill install filesystem -- npx -y @modelcontextprotocol/server-filesystem ~/projects
mcpkill install context7   -- npx -y @upstash/context7-mcp
mcpkill install github     -- npx -y @modelcontextprotocol/server-github

This pre-warms the embedding model and registers the server in Claude Code. Restart Claude and you're done.

Or manually in .claude/settings.json — just prefix the command with mcpkill --:

{
  "mcpServers": {
    "filesystem": {
      "command": "mcpkill",
      "args": ["--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "~/projects"]
    }
  }
}

See what you're saving

mcpkill stats
┌─────────────────────────────────────────┐
│           mcpkill cache stats            │
├─────────────────────────────────────────┤
│  Cache entries    47                     │
│  Stored chunks    312                    │
│  Cache hits       132  (80%)             │
├─────────────────────────────────────────┤
│  Tokens (original)68 200                 │
│  Tokens (returned) 4 100                 │
│  Tokens saved     ~64 100  (94%)         │
├─────────────────────────────────────────┤
│  DB size          1.80 MB                │
└─────────────────────────────────────────┘

How the cache works

On the first call to a tool:

  1. The response arrives from the MCP server
  2. mcpkill chunks it intelligently (by Markdown headers, JSON keys, or paragraphs)
  3. Each chunk is embedded locally using all-MiniLM-L6-v2 — ~5 ms, runs on your machine
  4. The top-K most relevant chunks are returned to Claude
  5. Everything is stored in ~/.mcpkill.db (SQLite, no external services)

On any subsequent call to the same or a similar tool:

  1. The query is embedded (~5 ms)
  2. Cosine similarity search against stored queries
  3. If similarity ≥ 0.85 → cache hit: re-rank stored chunks, return top-K instantly
  4. The MCP server response is discarded — no tokens counted

Options

Flag Default Description
--max-chunks 4 Chunks returned per response
--threshold 0.85 Similarity threshold for cache hits
--ttl-days 7 Evict entries not used in N days
--max-db-mb 100 Max DB size before LRU eviction
--cache-db ~/.mcpkill.db SQLite path
--dry-run off Log decisions, return original response
-v / --verbose off Show hit/miss on stderr

Persist any option in ~/.mcpkill.toml:

threshold  = 0.80
max_chunks = 6
ttl_days   = 14

Cache management

mcpkill clear --expired          # remove entries past TTL
mcpkill clear --older-than 14   # unused for 14+ days
mcpkill clear --all             # wipe everything

MIT — rustkit-ai