ai-summary
Web search & summarization CLI for AI coding agents. Reduces token consumption by compressing web content through local LLMs or Gemini before feeding it to Claude Code (or any LLM-powered tool).
How It Works
┌──────────────┐ ┌──────────┐ ┌───────────┐ ┌──────────────┐
│ Web Search │────▶│ Fetch │────▶│ Readability│────▶│ LLM Summary │
│ Gemini/DDG/ │ │ Pages │ │ Extract │ │ Local/Remote │
│ Brave │ │ │ │ │ │ │
└──────────────┘ └──────────┘ └───────────┘ └──────────────┘
│
▼
┌──────────────────┐
│ Compressed output │
│ (60-98% smaller) │
└──────────────────┘
Instead of sending raw 50K+ page content to Claude, ai-summary returns a focused 1-4K summary — saving tokens and money.
Features
- Search + Summarize — Gemini (Google Search grounding), DuckDuckGo, or Brave Search
- Fetch + Summarize — Fetch any URL, extract article content via readability, summarize with LLM
- Stdin Summarize — Pipe any text through for compression
- Fast Compress — No-LLM text extraction for instant compression (used by hooks)
- JS-heavy Pages — agent-browser and Cloudflare Browser Rendering support
- Pipe-friendly —
cat urls.txt | ai-summary fetch,--jsonoutput, standard exit codes - Claude Code Hooks — PostToolUse hooks auto-compress WebFetch, WebSearch, and test output
- Rich Statistics — Time-period breakdown, ROI tracking, per-mode analysis
- Multiple LLM Backends — opencode (free), oMLX (local), OpenAI, Groq, DeepSeek, or any OpenAI-compatible API
Installation
# From crates.io
# Or build from source
Pre-built binaries for macOS (Apple Silicon / Intel) and Linux are available on GitHub Releases.
Requirements: Rust 1.70+, a summarization backend (opencode CLI recommended — free).
Quick Start
# Generate config file
# Search (uses Gemini CLI > Gemini API > DDG > Brave)
# Fetch URLs and summarize
# Fetch from stdin
|
# Compress piped text (no LLM, instant)
|
# JSON output (for scripting)
|
# Check token savings (with time periods and ROI)
Configuration
Config file: ~/.ai-summary/config.toml (auto-created with ai-summary config)
# LLM backend — local oMLX (recommended for Apple Silicon)
= "http://127.0.0.1:8000"
= "" # Leave empty for oMLX auto-detection
= "Qwen3.5-9B-MLX-4bit"
# Search provider — Gemini + Google Search (recommended)
= "" # Free: https://aistudio.google.com/apikey
= "gemini-2.0-flash"
# Brave Search fallback (free: https://brave.com/search/api/)
= ""
= 3
= 4000
= 1024
Search priority: Gemini CLI > Gemini API > DuckDuckGo > Brave
Environment variables: GEMINI_API_KEY, BRAVE_API_KEY, AI_SUMMARY_API_URL, AI_SUMMARY_API_KEY, AI_SUMMARY_MODEL.
Claude Code Integration
PostToolUse Hooks
Three hooks auto-compress Claude Code tool responses:
| Hook | Matcher | Behavior |
|---|---|---|
postwebfetch.sh |
WebFetch | First fetch: compress (skips if <10% savings). Second fetch of same URL: pass through |
postwebsearch.sh |
WebSearch | Compress long search results (skips if <10% savings) |
postbash.sh |
Bash | Summarize passing test output with structured totals (cargo test, npm test, pytest, etc.) |
Add to ~/.claude/settings.json:
Requires jq and ai-summary in PATH.
Note: PostToolUse hooks only run for successful commands (exit 0). Failed test output passes through directly to Claude.
Subcommands
| Command | Description |
|---|---|
ai-summary <query> |
Search the web and summarize results |
ai-summary fetch <urls> -p <prompt> |
Fetch URLs and summarize |
ai-summary sum <prompt> |
Summarize stdin text via LLM |
ai-summary compress -m <chars> |
Fast text compression (no LLM) |
ai-summary crawl <url> -p <prompt> |
Crawl website via Cloudflare Browser Rendering |
ai-summary stats |
Show token savings statistics |
ai-summary reset-stats |
Reset statistics |
ai-summary config |
Show or create config file |
Flags
| Flag | Description |
|---|---|
--deep |
Fetch more pages (5 instead of 3) |
--raw |
Skip summarization, return raw content |
--json |
Structured JSON output (for scripting/piping) |
--browser |
Use agent-browser for JS-heavy pages |
--cf |
Use Cloudflare Browser Rendering |
--api-url |
Override API endpoint |
--api-key |
Override API key |
--model |
Override model name |
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | User error (bad args, no input) |
| 2 | API/network error (no results, fetch failed) |
Statistics
ai-summary Token Savings
════════════════════════════════════════════════════════════
Metric Today 7 days 30 days All Time
────────────────────────────────────────────────────────────
Queries 8 17 17 21
Pages fetched 8 17 17 17
Tokens saved 14.4K 29.2K 29.2K 31.3K
Cost saved $0.04 $0.09 $0.09 $0.09
Compression 84% 84% 84% 76%
────────────────────────────────────────────────────────────
ROI: $0.011 LLM cost -> $0.09 Claude cost saved (9x return)
By Mode (hooks: 0, manual: 17)
────────────────────────────────────────────────────────────────────
# Mode Count Saved Avg% Time Impact
────────────────────────────────────────────────────────────────────
1. gemini-cli 7 21.0K 85.6% 65.1s ██████████
2. fetch 6 3.6K 70.9% 16.0s █░░░░░░░░░
3. gemini 1 3.3K 93.3% 4.1s █░░░░░░░░░
4. stdin 3 1.3K 77.3% 12.0s ░░░░░░░░░░
────────────────────────────────────────────────────────────────────
License
MIT