mcpkill 0.1.0

Universal MCP proxy — semantic cache + chunking to kill token waste
Documentation
<div align="center">

# mcpkill

**Your MCP servers are dumping thousands of tokens into Claude's context.<br>mcpkill intercepts every response and returns only what's relevant.**

[![CI](https://github.com/rustkit-ai/mcpkill/actions/workflows/ci.yml/badge.svg)](https://github.com/rustkit-ai/mcpkill/actions/workflows/ci.yml)
[![crates.io](https://img.shields.io/crates/v/mcpkill.svg)](https://crates.io/crates/mcpkill)
[![MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Rust](https://img.shields.io/badge/built_with-Rust-dea584.svg)](https://www.rust-lang.org)

</div>

---

## The problem

When Claude calls an MCP tool — `read_file`, `search_docs`, `list_directory` —
the server returns the **entire** result. If you ask *"how does the auth middleware work?"*
and the file is 600 lines, Claude gets 600 lines. Every time. Whether it asked the
same question before or not.

That's slow, expensive, and fills the context window with noise.

## What mcpkill does

It wraps any MCP server and sits invisibly between Claude and the server:

```
Claude  →  mcpkill  →  MCP server
    intercepts every tools/call response
    chunks it, embeds it, caches it
    returns only the relevant parts
```

Claude still talks to the same MCP server. Nothing changes from its perspective.
The difference is what it gets back.

## Numbers from a real session

Five questions about a Rust codebase, using `server-filesystem`:

```
                        Raw       Filtered   Saved
  ─────────────────────────────────────────────────
  read_file proxy.rs    1 637t     390t      − 76%
  read_file filter.rs   1 504t     374t      − 75%
  read_file cache.rs    2 763t     348t      − 87%
  read_file proxy.rs *    same      390t      − 76%   ← cache hit, 0ms lookup
  read_file filter.rs *   same      374t      − 75%   ← cache hit, 0ms lookup
  ─────────────────────────────────────────────────
  Total                 9 045t   1 876t      − 79%

  * similar question on the same file, served from cache
```

**80% cache hit rate from the first session.**

## Install

```bash
curl -fsSL https://raw.githubusercontent.com/rustkit-ai/mcpkill/main/install.sh | sh
```

Or via Homebrew:

```bash
brew tap rustkit-ai/tap && brew install mcpkill
```

Or via Cargo:

```bash
cargo install mcpkill
```

## Add it to any MCP server — one command

```bash
mcpkill install filesystem -- npx -y @modelcontextprotocol/server-filesystem ~/projects
mcpkill install context7   -- npx -y @upstash/context7-mcp
mcpkill install github     -- npx -y @modelcontextprotocol/server-github
```

This pre-warms the embedding model and registers the server in Claude Code.
Restart Claude and you're done.

Or manually in `.claude/settings.json` — just prefix the command with `mcpkill --`:

```json
{
  "mcpServers": {
    "filesystem": {
      "command": "mcpkill",
      "args": ["--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "~/projects"]
    }
  }
}
```

## See what you're saving

```bash
mcpkill stats
```

```
┌─────────────────────────────────────────┐
│           mcpkill cache stats            │
├─────────────────────────────────────────┤
│  Cache entries    47                     │
│  Stored chunks    312                    │
│  Cache hits       132  (80%)             │
├─────────────────────────────────────────┤
│  Tokens (original)68 200                 │
│  Tokens (returned) 4 100                 │
│  Tokens saved     ~64 100  (94%)         │
├─────────────────────────────────────────┤
│  DB size          1.80 MB                │
└─────────────────────────────────────────┘
```

## How the cache works

On the first call to a tool:

1. The response arrives from the MCP server
2. mcpkill chunks it intelligently (by Markdown headers, JSON keys, or paragraphs)
3. Each chunk is embedded locally using `all-MiniLM-L6-v2`~5 ms, runs on your machine
4. The top-K most relevant chunks are returned to Claude
5. Everything is stored in `~/.mcpkill.db` (SQLite, no external services)

On any subsequent call to the same or a similar tool:

1. The query is embedded (~5 ms)
2. Cosine similarity search against stored queries
3. If similarity ≥ 0.85 → cache hit: re-rank stored chunks, return top-K instantly
4. The MCP server response is discarded — no tokens counted

## Options

| Flag | Default | Description |
|------|---------|-------------|
| `--max-chunks` | `4` | Chunks returned per response |
| `--threshold` | `0.85` | Similarity threshold for cache hits |
| `--ttl-days` | `7` | Evict entries not used in N days |
| `--max-db-mb` | `100` | Max DB size before LRU eviction |
| `--cache-db` | `~/.mcpkill.db` | SQLite path |
| `--dry-run` | off | Log decisions, return original response |
| `-v / --verbose` | off | Show hit/miss on stderr |

Persist any option in `~/.mcpkill.toml`:

```toml
threshold  = 0.80
max_chunks = 6
ttl_days   = 14
```

## Cache management

```bash
mcpkill clear --expired          # remove entries past TTL
mcpkill clear --older-than 14   # unused for 14+ days
mcpkill clear --all             # wipe everything
```

---

<div align="center">

MIT — [rustkit-ai](https://github.com/rustkit-ai)

</div>