sqz compresses command output before it reaches your LLM. Single Rust binary, zero config.
The real win is dedup: when the same file gets read 5 times in a session, sqz sends it once and returns a 13-token reference for every repeat.
Without sqz: With sqz:
File read #1: 2,000 tokens File read #1: ~800 tokens (compressed)
File read #2: 2,000 tokens File read #2: ~13 tokens (dedup ref)
File read #3: 2,000 tokens File read #3: ~13 tokens (dedup ref)
─────────────────────── ───────────────────────
Total: 6,000 tokens Total: ~826 tokens (86% saved)
Token Savings
Single-command compression (measured via cargo test -p sqz-engine benchmarks):
| Content | Before | After | Saved |
|---|---|---|---|
| Repeated log lines | 148 | 62 | 58% |
| Large JSON array | 259 | 142 | 45% |
| JSON API response | 64 | 53 | 17% |
| Git diff | 61 | 54 | 12% |
| Prose/docs | 124 | 121 | 2% |
| Stack trace (safe mode) | 82 | 82 | 0% |
Session-level savings (with dedup cache across repeated reads):
| Scenario | Without sqz | With sqz | Saved |
|---|---|---|---|
| Same file read 5x | 10,000 | 826 | 92% |
| Same JSON response 3x | 192 | 79 | 59% |
| Test-fix-test cycle (3 runs) | 15,000 | 5,186 | 65% |
The dedup cache is where the real savings live. Single-command compression ranges from 2-58% depending on content. Repeated reads drop to 13 tokens each.
Install
Then:
That's it. Shell hooks installed, AI tool hooks configured.
How It Works
sqz installs a PreToolUse hook that intercepts bash commands before your AI tool runs them. The output gets compressed transparently — the AI tool never knows.
Claude → git status → [sqz hook rewrites] → compressed output (85% smaller)
What gets compressed:
- Shell output — git, cargo, npm, docker, kubectl, ls, grep, etc.
- JSON — strips nulls, compact encoding
- Logs — collapses repeated lines
- Test output — shows failures only
What doesn't get compressed:
- Stack traces, error messages, secrets — routed to safe mode (0% compression)
- Your prompts and the AI's responses — controlled by the AI tool, not sqz
Supported Tools
| Tool | Integration | Setup |
|---|---|---|
| Claude Code | PreToolUse hook (transparent) | sqz init |
| Cursor | PreToolUse hook (transparent) | sqz init |
| Windsurf | PreToolUse hook (transparent) | sqz init |
| Cline | PreToolUse hook (transparent) | sqz init |
| Gemini CLI | BeforeTool hook (transparent) | sqz init |
| OpenCode | TypeScript plugin (transparent) | sqz init |
| VS Code | Extension | Install from Marketplace |
| JetBrains | Plugin | Install from Marketplace |
| Chrome | Browser extension | ChatGPT, Claude.ai, Gemini, Grok, Perplexity |
| Firefox | Browser extension | Same sites |
CLI
Track Your Savings
)
How Compression Works
- Per-command formatters —
git status→ compact summary,cargo test→ failures only,docker ps→ name/image/status table - Structural summaries — code files compressed to imports + function signatures + call graph (~70% reduction). The model sees the architecture, not implementation noise.
- Dedup cache — SHA-256 content hash, persistent across sessions. Second read = 13-token reference.
- JSON pipeline — strip nulls → project out debug fields → flatten → collapse arrays → TOON encoding (lossless compact format)
- Safe mode — stack traces, secrets, migrations detected by entropy analysis and routed through with 0% compression
For the full technical details, see docs/.
Configuration
# ~/.sqz/presets/default.toml
[]
= "default"
= "1.0"
[]
= true
= 3
[]
= true
[]
= 0.70
= 200000
Privacy
- Zero telemetry — no data transmitted, no crash reports
- Fully offline — works in air-gapped environments
- All processing local
Development
License
Elastic License 2.0 (ELv2) — use, fork, modify freely. Two restrictions: no competing hosted service, no removing license notices.