ai-summary
Web search & summarization CLI for AI coding agents. Reduces token consumption by compressing web content through local LLMs or Gemini before feeding it to Claude Code (or any LLM-powered tool).
How It Works
┌──────────────┐ ┌──────────┐ ┌───────────┐ ┌──────────────┐
│ Web Search │────▶│ Fetch │────▶│ Readability│────▶│ LLM Summary │
│ Gemini/DDG/ │ │ Pages │ │ Extract │ │ Local/Remote │
│ Brave │ │ │ │ │ │ │
└──────────────┘ └──────────┘ └───────────┘ └──────────────┘
│
▼
┌──────────────────┐
│ Compressed output │
│ (60-98% smaller) │
└──────────────────┘
Instead of sending raw 50K+ page content to Claude, ai-summary returns a focused 1-4K summary — saving tokens and money.
Features
- Search + Summarize — Gemini (Google Search grounding), DuckDuckGo, or Brave Search
- Fetch + Summarize — Fetch any URL, extract article content via readability, summarize with LLM
- Stdin Summarize — Pipe any text through for compression
- Fast Compress — No-LLM text extraction for instant compression
- GitHub Code Search — Search code and read files from GitHub repos via
ghCLI + LLM summarization - Repo Summarize — Pack remote GitHub repos with repomix and summarize via LLM
- Test Output Compression —
wrapsubcommand compresses passing test output (cargo test, npm test, pytest, etc.) - JS-heavy Pages — agent-browser and Cloudflare Browser Rendering support
- Pipe-friendly —
cat urls.txt | ai-summary fetch,--jsonoutput, standard exit codes - Claude Code Hook — PreToolUse hook rewrites test commands for real token savings
- Rich Statistics — Time-period breakdown, ROI tracking, per-mode analysis
- Multiple LLM Backends — opencode (free), oMLX (local), OpenAI, Groq, DeepSeek, or any OpenAI-compatible API
Installation
# Quick install (recommended) — downloads prebuilt binary
|
# Or from crates.io
# Or build from source
Pre-built binaries for macOS (Apple Silicon / Intel) and Linux are available on GitHub Releases.
Requirements: a summarization backend (opencode CLI recommended — free). Rust 1.70+ if building from source.
Quick Start
# Generate config file
# Search (uses Gemini CLI > Gemini API > DDG > Brave)
# Fetch URLs and summarize
# Fetch from stdin
|
# Compress piped text (no LLM, instant)
|
# Search GitHub code (requires gh CLI)
# Read a file from a GitHub repo
# Browse a repo directory
# Summarize a remote GitHub repo
# Wrap test commands (compress passing output)
# JSON output (for scripting)
|
# Check token savings
Configuration
Config file: ~/.ai-summary/config.toml (auto-created with ai-summary config)
# LLM backend — local oMLX (recommended for Apple Silicon)
= "http://127.0.0.1:8000"
= "" # Leave empty for oMLX auto-detection
= "Qwen3.5-9B-MLX-4bit"
# Search provider — Gemini + Google Search (recommended)
= "" # Free: https://aistudio.google.com/apikey
= "gemini-2.0-flash"
# Brave Search fallback (free: https://brave.com/search/api/)
= ""
= 3
= 4000
= 1024
Search priority: Gemini CLI > Gemini API > DuckDuckGo > Brave
Environment variables: GEMINI_API_KEY, BRAVE_API_KEY, AI_SUMMARY_API_URL, AI_SUMMARY_API_KEY, AI_SUMMARY_MODEL.
Claude Code Integration
One-command setup
This installs three things:
- Prompt injection into
~/.claude/CLAUDE.md— Claude and all subagents useai-summaryinstead of built-in WebSearch/WebFetch - Bash hook — rewrites test commands to run through
ai-summary wrapfor real token savings - WebFetch/WebSearch hooks — on first use per session, denies and reminds Claude to use
ai-summary; subsequent calls pass through silently
Without hook: With hook:
Claude ──cargo test──▶ shell ──▶ cargo Claude ──cargo test──▶ hook ──▶ ai-summary wrap
▲ │ ▲ │
│ ~3000 tokens (raw) │ │ ~15 tokens │ run + filter
└──────────────────────────────┘ └───────────────────────────┘
Supported test commands: cargo test, cargo nextest, npm test, npx vitest, npx jest, yarn test, pytest, go test, mix test, dotnet test, make test.
Tee mode
When a wrapped command fails, the full raw output is saved to /tmp/ai-summary-tee/ so the AI can read it if the summary isn't enough:
TESTS FAILED: 9 passed, 1 failed, 0 ignored.
test bar ... FAILED
[ai-summary] Full output saved to: /tmp/ai-summary-tee/1710000000_cargo_test.log
Requires jq and ai-summary in PATH.
Subcommands
| Command | Description |
|---|---|
ai-summary <query> |
Search the web and summarize results |
ai-summary fetch <urls> -p <prompt> |
Fetch URLs and summarize |
ai-summary sum <prompt> |
Summarize stdin text via LLM |
ai-summary compress -m <chars> |
Fast text compression (no LLM) |
ai-summary wrap <command> |
Run command, compress passing test output |
ai-summary github <query> [-r repo] [-l lang] |
Search GitHub code via gh CLI |
ai-summary github <owner/repo> [path] |
Read file or browse directory from GitHub repo |
ai-summary repo <owner/repo> -p <prompt> |
Pack remote repo with repomix and summarize |
ai-summary crawl <url> -p <prompt> |
Crawl website via Cloudflare Browser Rendering |
ai-summary init |
Install Claude Code integration (prompt + hook) |
ai-summary stats |
Show token savings statistics |
ai-summary reset-stats |
Reset statistics |
ai-summary config |
Show or create config file |
Flags
| Flag | Description |
|---|---|
--deep |
Fetch more pages (5 instead of 3) |
--raw |
Skip summarization, return raw content |
--json |
Structured JSON output (for scripting/piping) |
--browser |
Use agent-browser for JS-heavy pages |
--cf |
Use Cloudflare Browser Rendering |
--api-url |
Override API endpoint |
--api-key |
Override API key |
--model |
Override model name |
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | User error (bad args, no input) |
| 2 | API/network error (no results, fetch failed) |
Statistics
ai-summary Token Savings
════════════════════════════════════════════════════════════
Metric Today 7 days 30 days All Time
────────────────────────────────────────────────────────────
Queries 8 17 17 21
Pages fetched 8 17 17 17
Tokens saved 14.4K 29.2K 29.2K 31.3K
Cost saved $0.04 $0.09 $0.09 $0.09
Compression 84% 84% 84% 76%
────────────────────────────────────────────────────────────
ROI: $0.011 LLM cost -> $0.09 Claude cost saved (9x return)
License
MIT