webshift-mcp
Denoised web search MCP server — single static binary, zero runtime dependencies.
webshift-mcp exposes three MCP tools over stdio:
| Tool | Description |
|---|---|
webshift_query |
Full search pipeline: search + fetch + clean + rerank + (optional) LLM summarize |
webshift_fetch |
Fetch and clean a single URL |
webshift_onboarding |
Returns a JSON guide for the agent (budgets, backends, tips) |
Built on top of the webshift library.
Installation
The binary is called mcp-webshift.
Quick start
1. Set up a search backend
The easiest option is SearXNG — free, self-hosted, no API key:
No Docker? Use a cloud backend — see Search backends.
2. Configure your MCP client
That's it. Your AI agent now has web search with clean, budget-controlled text output.
For client-specific setup (Claude Desktop, Claude Code, Zed, Cursor, Windsurf, VS Code, Gemini CLI) see docs/integrations/.
MCP tool parameters
webshift_query
| Parameter | Type | Default | Description |
|---|---|---|---|
queries |
string or list | required | Search query or list of queries |
num_results |
integer | 5 | Results per query |
lang |
string | none | Language filter (e.g. "en") |
backend |
string | config default | Override search backend for this call |
webshift_fetch
| Parameter | Type | Description |
|---|---|---|
url |
string | URL to fetch and clean |
Configuration
Resolution order (highest priority first):
- CLI args —
--default-backend,--brave-api-key, etc. - Environment variables —
WEBSHIFT_*prefix - Config file —
webshift.toml(current dir, then~/webshift.toml) - Built-in defaults
Config file (webshift.toml)
[]
= 32000 # total char budget across all sources
= 8000 # per-page char cap
= 20 # hard cap on results per call
= 1 # streaming cap per page (MB)
= 8 # seconds
= 5
= "auto" # "auto" | "on" | "off"
[]
= "searxng"
[]
= "http://localhost:8080"
[]
= "BSA-..."
[]
= "tvly-..."
[]
= "..."
[]
= "..."
= "google"
[]
= "..."
= "..."
[]
= "..."
= "en-US"
[]
= false
= "http://localhost:11434/v1"
= "gemma3:27b"
= true
= true
For the full configuration reference (all TOML keys, env vars, CLI args) see docs/CONFIGURATION.md.
Key CLI args
Key environment variables
WEBSHIFT_DEFAULT_BACKEND=searxng
WEBSHIFT_SEARXNG_URL=http://localhost:8080
WEBSHIFT_BRAVE_API_KEY=BSA-xxx
WEBSHIFT_LLM_ENABLED=true
WEBSHIFT_LLM_BASE_URL=http://localhost:11434/v1
WEBSHIFT_LLM_MODEL=gemma3:27b
Search backends
| Backend | Auth | Notes |
|---|---|---|
| SearXNG | none | Self-hosted, free. Default: http://localhost:8080 |
| Brave | API key | Free tier. brave.com/search/api |
| Tavily | API key | AI-oriented. tavily.com |
| Exa | API key | Neural search. exa.ai |
| SerpAPI | API key | Multi-engine proxy. serpapi.com |
| API key + CX | Custom Search. Free: 100 req/day. programmablesearchengine.google.com | |
| Bing | API key | Web Search API. Free: 1,000 req/month. azure.microsoft.com |
| HTTP | configurable | Generic REST backend — TOML-only config, no code required |
LLM features (optional)
All opt-in — disabled by default. Works with any OpenAI-compatible API (OpenAI, Ollama, vLLM, LM Studio):
[]
= true
= "http://localhost:11434/v1"
= "gemma3:27b"
| Feature | What it does |
|---|---|
| Query expansion | Single query → N complementary search variants |
| Summarization | Markdown report with inline [1] [2] citations |
| LLM reranking | Tier-2 reranking on top of deterministic BM25 |
Anti-flooding protections
Always active:
| Protection | Description |
|---|---|
max_download_mb |
Streaming cap — never buffers full response |
max_result_length |
Hard cap on characters per cleaned page |
max_query_budget |
Total character budget across all sources |
max_total_results |
Hard cap on results per call |
| Binary filter | .pdf, .zip, .exe, etc. filtered before any network request |
| Unicode sterilization | BiDi control chars, zero-width chars removed |
Links
- GitHub Repository — Source code and issues
- webshift library — Rust library crate
- Configuration Reference
- Use Cases & Examples
- IDE & Agent Integration Guides
- MCP Protocol
License
MIT License — see LICENSE for details.