<div align="center">
# mcpkill
**Your MCP servers are dumping thousands of tokens into Claude's context.<br>mcpkill intercepts every response and returns only what's relevant.**
[](https://github.com/rustkit-ai/mcpkill/actions/workflows/ci.yml)
[](https://crates.io/crates/mcpkill)
[](LICENSE)
[](https://www.rust-lang.org)
</div>
---
## The problem
When Claude calls an MCP tool — `read_file`, `search_docs`, `list_directory` —
the server returns the **entire** result. If you ask *"how does the auth middleware work?"*
and the file is 600 lines, Claude gets 600 lines. Every time. Whether it asked the
same question before or not.
That's slow, expensive, and fills the context window with noise.
## What mcpkill does
It wraps any MCP server and sits invisibly between Claude and the server:
```
Claude → mcpkill → MCP server
↑
intercepts every tools/call response
chunks it, embeds it, caches it
returns only the relevant parts
```
Claude still talks to the same MCP server. Nothing changes from its perspective.
The difference is what it gets back.
## Numbers from a real session
Five questions about a Rust codebase, using `server-filesystem`:
```
Raw Filtered Saved
─────────────────────────────────────────────────
read_file proxy.rs 1 637t 390t − 76%
read_file filter.rs 1 504t 374t − 75%
read_file cache.rs 2 763t 348t − 87%
read_file proxy.rs * same 390t − 76% ← cache hit, 0ms lookup
read_file filter.rs * same 374t − 75% ← cache hit, 0ms lookup
─────────────────────────────────────────────────
Total 9 045t 1 876t − 79%
* similar question on the same file, served from cache
```
**80% cache hit rate from the first session.**
## Install
```bash
curl -fsSL https://raw.githubusercontent.com/rustkit-ai/mcpkill/main/install.sh | sh
```
Or via Homebrew:
```bash
brew tap rustkit-ai/tap && brew install mcpkill
```
Or via Cargo:
```bash
cargo install mcpkill
```
## Add it to any MCP server — one command
```bash
mcpkill install filesystem -- npx -y @modelcontextprotocol/server-filesystem ~/projects
mcpkill install context7 -- npx -y @upstash/context7-mcp
mcpkill install github -- npx -y @modelcontextprotocol/server-github
```
This pre-warms the embedding model and registers the server in Claude Code.
Restart Claude and you're done.
Or manually in `.claude/settings.json` — just prefix the command with `mcpkill --`:
```json
{
"mcpServers": {
"filesystem": {
"command": "mcpkill",
"args": ["--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "~/projects"]
}
}
}
```
## See what you're saving
```bash
mcpkill stats
```
```
┌─────────────────────────────────────────┐
│ mcpkill cache stats │
├─────────────────────────────────────────┤
│ Cache entries 47 │
│ Stored chunks 312 │
│ Cache hits 132 (80%) │
├─────────────────────────────────────────┤
│ Tokens (original)68 200 │
│ Tokens (returned) 4 100 │
│ Tokens saved ~64 100 (94%) │
├─────────────────────────────────────────┤
│ DB size 1.80 MB │
└─────────────────────────────────────────┘
```
## How the cache works
On the first call to a tool:
1. The response arrives from the MCP server
2. mcpkill chunks it intelligently (by Markdown headers, JSON keys, or paragraphs)
3. Each chunk is embedded locally using `all-MiniLM-L6-v2` — ~5 ms, runs on your machine
4. The top-K most relevant chunks are returned to Claude
5. Everything is stored in `~/.mcpkill.db` (SQLite, no external services)
On any subsequent call to the same or a similar tool:
1. The query is embedded (~5 ms)
2. Cosine similarity search against stored queries
3. If similarity ≥ 0.85 → cache hit: re-rank stored chunks, return top-K instantly
4. The MCP server response is discarded — no tokens counted
## Options
| `--max-chunks` | `4` | Chunks returned per response |
| `--threshold` | `0.85` | Similarity threshold for cache hits |
| `--ttl-days` | `7` | Evict entries not used in N days |
| `--max-db-mb` | `100` | Max DB size before LRU eviction |
| `--cache-db` | `~/.mcpkill.db` | SQLite path |
| `--dry-run` | off | Log decisions, return original response |
| `-v / --verbose` | off | Show hit/miss on stderr |
Persist any option in `~/.mcpkill.toml`:
```toml
threshold = 0.80
max_chunks = 6
ttl_days = 14
```
## Cache management
```bash
mcpkill clear --expired # remove entries past TTL
mcpkill clear --older-than 14 # unused for 14+ days
mcpkill clear --all # wipe everything
```
---
<div align="center">
MIT — [rustkit-ai](https://github.com/rustkit-ai)
</div>