ai-summary 2.7.0

Web search with local LLM summarization — token saver for Claude Code
ai-summary-2.7.0 is not a library.

ai-summary

Web search & summarization CLI for AI coding agents. Reduces token consumption by compressing web content through local LLMs or Gemini before feeding it to Claude Code (or any LLM-powered tool).

How It Works

┌──────────────┐     ┌──────────┐     ┌───────────┐     ┌──────────────┐
│  Web Search  │────▶│  Fetch   │────▶│ Readability│────▶│  LLM Summary │
│ Gemini/DDG/  │     │  Pages   │     │ Extract    │     │ Local/Remote │
│   Brave      │     │          │     │            │     │              │
└──────────────┘     └──────────┘     └───────────┘     └──────────────┘
                                                               │
                                                               ▼
                                                     ┌──────────────────┐
                                                     │ Compressed output │
                                                     │ (60-98% smaller) │
                                                     └──────────────────┘

Instead of sending raw 50K+ page content to Claude, ai-summary returns a focused 1-4K summary — saving tokens and money.

Features

  • Search + Summarize — Gemini (Google Search grounding), DuckDuckGo, or Brave Search
  • Fetch + Summarize — Fetch any URL, extract article content via readability, summarize with LLM
  • Stdin Summarize — Pipe any text through for compression
  • Fast Compress — No-LLM text extraction for instant compression
  • GitHub Code Search — Search code and read files from GitHub repos via gh CLI + LLM summarization
  • Repo Summarize — Pack remote GitHub repos with repomix and summarize via LLM
  • Test Output Compressionwrap subcommand compresses passing test output (cargo test, npm test, pytest, etc.)
  • JS-heavy Pages — agent-browser and Cloudflare Browser Rendering support
  • Pipe-friendlycat urls.txt | ai-summary fetch, --json output, standard exit codes
  • Claude Code Hook — PreToolUse hook rewrites test commands for real token savings
  • Rich Statistics — Time-period breakdown, ROI tracking, per-mode analysis
  • Multiple LLM Backends — opencode (free), oMLX (local), OpenAI, Groq, DeepSeek, or any OpenAI-compatible API

Installation

# Quick install (recommended) — downloads prebuilt binary
curl -fsSL https://ai-summary.agent-tools.org/install.sh | sh

# Or from crates.io
cargo install ai-summary

# Or build from source
git clone https://github.com/agent-tools-org/ai-summary.git
cd ai-summary
cargo install --path .

Pre-built binaries for macOS (Apple Silicon / Intel) and Linux are available on GitHub Releases.

Requirements: a summarization backend (opencode CLI recommended — free). Rust 1.70+ if building from source.

Quick Start

# Generate config file
ai-summary config

# Search (uses Gemini CLI > Gemini API > DDG > Brave)
ai-summary "what is the latest Rust version"

# Fetch URLs and summarize
ai-summary fetch https://example.com/article -p "what are the key points"

# Fetch from stdin
cat urls.txt | ai-summary fetch -p "summarize each"

# Compress piped text (no LLM, instant)
echo "large text..." | ai-summary compress -m 4000

# Search GitHub code (requires gh CLI)
ai-summary github "error handling" -r tokio-rs/tokio -l rust

# Read a file from a GitHub repo
ai-summary github owner/repo src/main.rs -p "explain this"

# Browse a repo directory
ai-summary github owner/repo src/

# Summarize a remote GitHub repo
ai-summary repo user/repo -p "explain the architecture"

# Wrap test commands (compress passing output)
ai-summary wrap cargo test

# JSON output (for scripting)
ai-summary --json "query" | jq '.summary'

# Check token savings
ai-summary stats

Configuration

Config file: ~/.ai-summary/config.toml (auto-created with ai-summary config)

# LLM backend — local oMLX (recommended for Apple Silicon)
api_url = "http://127.0.0.1:8000"
api_key = ""  # Leave empty for oMLX auto-detection
model = "Qwen3.5-9B-MLX-4bit"

# Search provider — Gemini + Google Search (recommended)
gemini_api_key = ""  # Free: https://aistudio.google.com/apikey
gemini_model = "gemini-2.0-flash"

# Brave Search fallback (free: https://brave.com/search/api/)
brave_api_key = ""

max_pages = 3
max_page_chars = 4000
max_summary_tokens = 1024

Search priority: Gemini CLI > Gemini API > DuckDuckGo > Brave

Environment variables: GEMINI_API_KEY, BRAVE_API_KEY, AI_SUMMARY_API_URL, AI_SUMMARY_API_KEY, AI_SUMMARY_MODEL.

Claude Code Integration

One-command setup

ai-summary init                # Install prompt injection + PreToolUse hook
ai-summary init --with-repomix # Also install repomix (for repo command)
ai-summary init --uninstall    # Remove everything

This installs three things:

  1. Prompt injection into ~/.claude/CLAUDE.md — Claude and all subagents use ai-summary instead of built-in WebSearch/WebFetch
  2. Bash hook — rewrites test commands to run through ai-summary wrap for real token savings
  3. WebFetch/WebSearch hooks — on first use per session, denies and reminds Claude to use ai-summary; subsequent calls pass through silently
Without hook:                          With hook:

Claude ──cargo test──▶ shell ──▶ cargo  Claude ──cargo test──▶ hook ──▶ ai-summary wrap
  ▲                              │        ▲                           │
  │     ~3000 tokens (raw)       │        │     ~15 tokens            │  run + filter
  └──────────────────────────────┘        └───────────────────────────┘

Supported test commands: cargo test, cargo nextest, npm test, npx vitest, npx jest, yarn test, pytest, go test, mix test, dotnet test, make test.

Tee mode

When a wrapped command fails, the full raw output is saved to /tmp/ai-summary-tee/ so the AI can read it if the summary isn't enough:

TESTS FAILED: 9 passed, 1 failed, 0 ignored.
test bar ... FAILED
[ai-summary] Full output saved to: /tmp/ai-summary-tee/1710000000_cargo_test.log

Requires jq and ai-summary in PATH.

Subcommands

Command Description
ai-summary <query> Search the web and summarize results
ai-summary fetch <urls> -p <prompt> Fetch URLs and summarize
ai-summary sum <prompt> Summarize stdin text via LLM
ai-summary compress -m <chars> Fast text compression (no LLM)
ai-summary wrap <command> Run command, compress passing test output
ai-summary github <query> [-r repo] [-l lang] Search GitHub code via gh CLI
ai-summary github <owner/repo> [path] Read file or browse directory from GitHub repo
ai-summary repo <owner/repo> -p <prompt> Pack remote repo with repomix and summarize
ai-summary crawl <url> -p <prompt> Crawl website via Cloudflare Browser Rendering
ai-summary init Install Claude Code integration (prompt + hook)
ai-summary stats Show token savings statistics
ai-summary reset-stats Reset statistics
ai-summary config Show or create config file

Flags

Flag Description
--deep Fetch more pages (5 instead of 3)
--raw Skip summarization, return raw content
--json Structured JSON output (for scripting/piping)
--browser Use agent-browser for JS-heavy pages
--cf Use Cloudflare Browser Rendering
--api-url Override API endpoint
--api-key Override API key
--model Override model name

Exit Codes

Code Meaning
0 Success
1 User error (bad args, no input)
2 API/network error (no results, fetch failed)

Statistics

ai-summary Token Savings
════════════════════════════════════════════════════════════

Metric               Today   7 days  30 days   All Time
────────────────────────────────────────────────────────────
Queries                  8       17       17         21
Pages fetched            8       17       17         17
Tokens saved         14.4K    29.2K    29.2K      31.3K
Cost saved           $0.04    $0.09    $0.09      $0.09
Compression            84%      84%      84%        76%
────────────────────────────────────────────────────────────

ROI: $0.011 LLM cost -> $0.09 Claude cost saved (9x return)

License

MIT