mcpkill 0.1.0

Universal MCP proxy — semantic cache + chunking to kill token waste
Documentation
# Use Case — Building a Next.js App with context7

A typical one-hour development session using the context7 MCP server
(documentation lookup) **without** and **with** mcpkill.

---

## The scenario

You are building a Next.js 16 app. You query context7 repeatedly throughout
the session to look up docs on Server Actions, caching, and routing. Each
context7 response is a large Markdown dump — typically **7 000–9 000 tokens**.

---

## Setup (one-time, ~30 s)

```bash
# Install mcpkill and register context7 through it
mcpkill install context7 -- npx -y @upstash/context7-mcp
```

```
[mcpkill] Step 1/2 — warming up embedding model …
[mcpkill] ✓ Model ready (~/.fastembed_cache)
[mcpkill] Step 2/2 — registering 'context7' in Claude Code (scope: user) …
[mcpkill] ✓ Done! Restart Claude Code (or open a new chat) to activate.
  Verify with: claude mcp list
```

That's it. Claude now sees `context7` exactly as before — mcpkill is
invisible in the middle.

---

## Session walkthrough

### Query 1 — cold start

**Prompt:** *"How do Server Actions work in Next.js 16?"*

context7 returns a 47-section Markdown file covering the entire Server Actions
API. mcpkill intercepts the response:

```
[mcpkill] CACHE MISS [8 340t] — chunking 42 108 bytes
[mcpkill] → returning 4/4 chunks (~610t)
```

Claude receives **610 tokens** of the most relevant sections instead of 8 340.
The full response is chunked, embedded, and stored in `~/.mcpkill.db`.

---

### Query 2 — semantic cache hit

**Prompt:** *"Show me how to revalidate data after a Server Action."*

Same tool call to context7, same large Markdown response — but this time
mcpkill recognises the query is semantically similar to Query 1 (cosine ≥ 0.85).
It never forwards the response to the embedder:

```
[mcpkill] CACHE HIT  [47 chunks] original=8340t
[mcpkill] → returning 4/4 chunks (~580t)
```

context7 is called, but **0 tokens reach Claude's context** from the raw dump.
mcpkill re-ranks the stored chunks against the new query and returns the four
most relevant sections — including `revalidatePath` and `revalidateTag`.

---

### Query 3 — different topic, new miss

**Prompt:** *"How does the Next.js App Router handle nested layouts?"*

New topic. Cache miss. context7 returns another large dump (~6 200 tokens):

```
[mcpkill] CACHE MISS [6 180t] — chunking 31 450 bytes
[mcpkill] → returning 4/4 chunks (~540t)
```

---

### Query 4 — hit on the new topic

**Prompt:** *"Can I have a layout per route group in Next.js?"*

Semantically close to Query 3:

```
[mcpkill] CACHE HIT  [38 chunks] original=6180t
[mcpkill] → returning 4/4 chunks (~490t)
```

---

### Query 5 — partial overlap

**Prompt:** *"How do I call a Server Action from a client component?"*

Overlaps with Query 1 (Server Actions) but focuses on a different angle.
Cosine similarity is 0.87 — above the 0.85 threshold:

```
[mcpkill] CACHE HIT  [47 chunks] original=8340t
[mcpkill] → returning 4/4 chunks (~620t)
```

The `'use client'` + `startTransition` chunks surface to the top because
they score highest against the new query embedding.

---

## End-of-session stats

```bash
mcpkill stats
```

```
┌─────────────────────────────────────────┐
│           mcpkill cache stats            │
├─────────────────────────────────────────┤
│  Cache entries    2                      │
│  Stored chunks    85                     │
│  Cache hits       3   (60%)              │
├─────────────────────────────────────────┤
│  Tokens (original)  22 860               │
│  Tokens (returned)   2 850               │
│  Tokens saved      ~20 010  (88%)        │
├─────────────────────────────────────────┤
│  DB size          0.14 MB                │
└─────────────────────────────────────────┘
```

5 queries, 88% token reduction. On a paid plan where context tokens cost
money, that's roughly **12× cheaper** for documentation lookups.

---

## What happened under the hood

```
Query 1 (cold)
  Claude ──► mcpkill ──► context7
                └─ MISS: chunk(8340t) → embed 47 chunks → store → return top-4 (610t)

Query 2 (warm, same topic)
  Claude ──► mcpkill ──► context7
                └─ HIT (cosine 0.91): re-rank 47 stored chunks → return top-4 (580t)
                   context7 response discarded before token counting

Query 5 (warm, related topic)
  Claude ──► mcpkill ──► context7
                └─ HIT (cosine 0.87): same 47 chunks, different top-4 (620t)
                   chunk about startTransition now ranks #1
```

The embedding model runs locally (~5 ms per query, all-MiniLM-L6-v2).
No data leaves your machine beyond the normal MCP call to context7.

---

## Tuning for your workflow

| Situation | Flag to adjust |
|-----------|---------------|
| Getting chunks that are too broad | `--max-chunks 2` |
| Cache hits on unrelated queries | `--threshold 0.92` |
| Stale docs (fast-moving library) | `--ttl-days 1` |
| Large codebase, many MCP servers | `--max-db-mb 500` |
| Debugging what gets filtered | `--dry-run --verbose` |

```bash
# Example: stricter cache, more chunks, weekly TTL
mcpkill --threshold 0.90 --max-chunks 6 --ttl-days 7 -- npx -y @upstash/context7-mcp
```

Or persist in `~/.mcpkill.toml` so every session picks it up automatically:

```toml
threshold  = 0.90
max_chunks = 6
ttl_days   = 7
```

---

## Works with any MCP server

mcpkill wraps any stdio MCP server — not just context7:

```json
// .claude/settings.json
{
  "mcpServers": {
    "context7": {
      "command": "mcpkill",
      "args": ["--", "npx", "-y", "@upstash/context7-mcp"]
    },
    "filesystem": {
      "command": "mcpkill",
      "args": ["--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/user/project"]
    },
    "github": {
      "command": "mcpkill",
      "args": ["--max-chunks", "6", "--", "npx", "-y", "@modelcontextprotocol/server-github"]
    }
  }
}
```

Servers with small, targeted responses (e.g. a single-row database lookup)
will rarely benefit from caching. Servers that return large blobs of text —
documentation, file trees, search results — see the biggest savings.