# Use Case — Building a Next.js App with context7
A typical one-hour development session using the context7 MCP server
(documentation lookup) **without** and **with** mcpkill.
---
## The scenario
You are building a Next.js 16 app. You query context7 repeatedly throughout
the session to look up docs on Server Actions, caching, and routing. Each
context7 response is a large Markdown dump — typically **7 000–9 000 tokens**.
---
## Setup (one-time, ~30 s)
```bash
# Install mcpkill and register context7 through it
mcpkill install context7 -- npx -y @upstash/context7-mcp
```
```
[mcpkill] Step 1/2 — warming up embedding model …
[mcpkill] ✓ Model ready (~/.fastembed_cache)
[mcpkill] Step 2/2 — registering 'context7' in Claude Code (scope: user) …
[mcpkill] ✓ Done! Restart Claude Code (or open a new chat) to activate.
Verify with: claude mcp list
```
That's it. Claude now sees `context7` exactly as before — mcpkill is
invisible in the middle.
---
## Session walkthrough
### Query 1 — cold start
**Prompt:** *"How do Server Actions work in Next.js 16?"*
context7 returns a 47-section Markdown file covering the entire Server Actions
API. mcpkill intercepts the response:
```
[mcpkill] CACHE MISS [8 340t] — chunking 42 108 bytes
[mcpkill] → returning 4/4 chunks (~610t)
```
Claude receives **610 tokens** of the most relevant sections instead of 8 340.
The full response is chunked, embedded, and stored in `~/.mcpkill.db`.
---
### Query 2 — semantic cache hit
**Prompt:** *"Show me how to revalidate data after a Server Action."*
Same tool call to context7, same large Markdown response — but this time
mcpkill recognises the query is semantically similar to Query 1 (cosine ≥ 0.85).
It never forwards the response to the embedder:
```
[mcpkill] CACHE HIT [47 chunks] original=8340t
[mcpkill] → returning 4/4 chunks (~580t)
```
context7 is called, but **0 tokens reach Claude's context** from the raw dump.
mcpkill re-ranks the stored chunks against the new query and returns the four
most relevant sections — including `revalidatePath` and `revalidateTag`.
---
### Query 3 — different topic, new miss
**Prompt:** *"How does the Next.js App Router handle nested layouts?"*
New topic. Cache miss. context7 returns another large dump (~6 200 tokens):
```
[mcpkill] CACHE MISS [6 180t] — chunking 31 450 bytes
[mcpkill] → returning 4/4 chunks (~540t)
```
---
### Query 4 — hit on the new topic
**Prompt:** *"Can I have a layout per route group in Next.js?"*
Semantically close to Query 3:
```
[mcpkill] CACHE HIT [38 chunks] original=6180t
[mcpkill] → returning 4/4 chunks (~490t)
```
---
### Query 5 — partial overlap
**Prompt:** *"How do I call a Server Action from a client component?"*
Overlaps with Query 1 (Server Actions) but focuses on a different angle.
Cosine similarity is 0.87 — above the 0.85 threshold:
```
[mcpkill] CACHE HIT [47 chunks] original=8340t
[mcpkill] → returning 4/4 chunks (~620t)
```
The `'use client'` + `startTransition` chunks surface to the top because
they score highest against the new query embedding.
---
## End-of-session stats
```bash
mcpkill stats
```
```
┌─────────────────────────────────────────┐
│ mcpkill cache stats │
├─────────────────────────────────────────┤
│ Cache entries 2 │
│ Stored chunks 85 │
│ Cache hits 3 (60%) │
├─────────────────────────────────────────┤
│ Tokens (original) 22 860 │
│ Tokens (returned) 2 850 │
│ Tokens saved ~20 010 (88%) │
├─────────────────────────────────────────┤
│ DB size 0.14 MB │
└─────────────────────────────────────────┘
```
5 queries, 88% token reduction. On a paid plan where context tokens cost
money, that's roughly **12× cheaper** for documentation lookups.
---
## What happened under the hood
```
Query 1 (cold)
Claude ──► mcpkill ──► context7
│
└─ MISS: chunk(8340t) → embed 47 chunks → store → return top-4 (610t)
Query 2 (warm, same topic)
Claude ──► mcpkill ──► context7
│
└─ HIT (cosine 0.91): re-rank 47 stored chunks → return top-4 (580t)
context7 response discarded before token counting
Query 5 (warm, related topic)
Claude ──► mcpkill ──► context7
│
└─ HIT (cosine 0.87): same 47 chunks, different top-4 (620t)
chunk about startTransition now ranks #1
```
The embedding model runs locally (~5 ms per query, all-MiniLM-L6-v2).
No data leaves your machine beyond the normal MCP call to context7.
---
## Tuning for your workflow
| Getting chunks that are too broad | `--max-chunks 2` |
| Cache hits on unrelated queries | `--threshold 0.92` |
| Stale docs (fast-moving library) | `--ttl-days 1` |
| Large codebase, many MCP servers | `--max-db-mb 500` |
| Debugging what gets filtered | `--dry-run --verbose` |
```bash
# Example: stricter cache, more chunks, weekly TTL
mcpkill --threshold 0.90 --max-chunks 6 --ttl-days 7 -- npx -y @upstash/context7-mcp
```
Or persist in `~/.mcpkill.toml` so every session picks it up automatically:
```toml
threshold = 0.90
max_chunks = 6
ttl_days = 7
```
---
## Works with any MCP server
mcpkill wraps any stdio MCP server — not just context7:
```json
// .claude/settings.json
{
"mcpServers": {
"context7": {
"command": "mcpkill",
"args": ["--", "npx", "-y", "@upstash/context7-mcp"]
},
"filesystem": {
"command": "mcpkill",
"args": ["--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/user/project"]
},
"github": {
"command": "mcpkill",
"args": ["--max-chunks", "6", "--", "npx", "-y", "@modelcontextprotocol/server-github"]
}
}
}
```
Servers with small, targeted responses (e.g. a single-row database lookup)
will rarely benefit from caching. Servers that return large blobs of text —
documentation, file trees, search results — see the biggest savings.