mcpkill
Your MCP servers are dumping thousands of tokens into Claude's context.mcpkill intercepts every response and returns only what's relevant.
The problem
When Claude calls an MCP tool — read_file, search_docs, list_directory —
the server returns the entire result. If you ask "how does the auth middleware work?"
and the file is 600 lines, Claude gets 600 lines. Every time. Whether it asked the
same question before or not.
That's slow, expensive, and fills the context window with noise.
What mcpkill does
It wraps any MCP server and sits invisibly between Claude and the server:
Claude → mcpkill → MCP server
↑
intercepts every tools/call response
chunks it, embeds it, caches it
returns only the relevant parts
Claude still talks to the same MCP server. Nothing changes from its perspective. The difference is what it gets back.
Numbers from a real session
Five questions about a Rust codebase, using server-filesystem:
Raw Filtered Saved
─────────────────────────────────────────────────
read_file proxy.rs 1 637t 390t − 76%
read_file filter.rs 1 504t 374t − 75%
read_file cache.rs 2 763t 348t − 87%
read_file proxy.rs * same 390t − 76% ← cache hit, 0ms lookup
read_file filter.rs * same 374t − 75% ← cache hit, 0ms lookup
─────────────────────────────────────────────────
Total 9 045t 1 876t − 79%
* similar question on the same file, served from cache
80% cache hit rate from the first session.
Install
|
Or via Homebrew:
&&
Or via Cargo:
Add it to any MCP server — one command
This pre-warms the embedding model and registers the server in Claude Code. Restart Claude and you're done.
Or manually in .claude/settings.json — just prefix the command with mcpkill --:
See what you're saving
┌─────────────────────────────────────────┐
│ mcpkill cache stats │
├─────────────────────────────────────────┤
│ Cache entries 47 │
│ Stored chunks 312 │
│ Cache hits 132 (80%) │
├─────────────────────────────────────────┤
│ Tokens (original)68 200 │
│ Tokens (returned) 4 100 │
│ Tokens saved ~64 100 (94%) │
├─────────────────────────────────────────┤
│ DB size 1.80 MB │
└─────────────────────────────────────────┘
How the cache works
On the first call to a tool:
- The response arrives from the MCP server
- mcpkill chunks it intelligently (by Markdown headers, JSON keys, or paragraphs)
- Each chunk is embedded locally using
all-MiniLM-L6-v2— ~5 ms, runs on your machine - The top-K most relevant chunks are returned to Claude
- Everything is stored in
~/.mcpkill.db(SQLite, no external services)
On any subsequent call to the same or a similar tool:
- The query is embedded (~5 ms)
- Cosine similarity search against stored queries
- If similarity ≥ 0.85 → cache hit: re-rank stored chunks, return top-K instantly
- The MCP server response is discarded — no tokens counted
Options
| Flag | Default | Description |
|---|---|---|
--max-chunks |
4 |
Chunks returned per response |
--threshold |
0.85 |
Similarity threshold for cache hits |
--ttl-days |
7 |
Evict entries not used in N days |
--max-db-mb |
100 |
Max DB size before LRU eviction |
--cache-db |
~/.mcpkill.db |
SQLite path |
--dry-run |
off | Log decisions, return original response |
-v / --verbose |
off | Show hit/miss on stderr |
Persist any option in ~/.mcpkill.toml:
= 0.80
= 6
= 14
Cache management
MIT — rustkit-ai