sqz-cli 0.6.0

Universal LLM context compressor — squeeze tokens from prompts, code, JSON, logs, and conversations
sqz-cli-0.6.0 is not a library.

sqz compresses command output before it reaches your LLM. Single Rust binary, zero config.

The real win is dedup: when the same file gets read 5 times in a session, sqz sends it once and returns a 13-token reference for every repeat.

Without sqz:                    With sqz:

File read #1:  2,000 tokens     File read #1:  ~800 tokens (compressed)
File read #2:  2,000 tokens     File read #2:  ~13 tokens  (dedup ref)
File read #3:  2,000 tokens     File read #3:  ~13 tokens  (dedup ref)
───────────────────────         ───────────────────────
Total:         6,000 tokens     Total:         ~826 tokens (86% saved)

Token Savings

Single-command compression (measured via cargo test -p sqz-engine benchmarks):

Content Before After Saved
Repeated log lines 148 62 58%
Large JSON array 259 142 45%
JSON API response 64 53 17%
Git diff 61 54 12%
Prose/docs 124 121 2%
Stack trace (safe mode) 82 82 0%

Session-level savings (with dedup cache across repeated reads):

Scenario Without sqz With sqz Saved
Same file read 5x 10,000 826 92%
Same JSON response 3x 192 79 59%
Test-fix-test cycle (3 runs) 15,000 5,186 65%

The dedup cache is where the real savings live. Single-command compression ranges from 2-58% depending on content. Repeated reads drop to 13 tokens each.

Install

cargo install sqz-cli

Then:

sqz init

That's it. Shell hooks installed, AI tool hooks configured.

How It Works

sqz installs a PreToolUse hook that intercepts bash commands before your AI tool runs them. The output gets compressed transparently — the AI tool never knows.

Claude → git status → [sqz hook rewrites] → compressed output (85% smaller)

What gets compressed:

  • Shell output — git, cargo, npm, docker, kubectl, ls, grep, etc.
  • JSON — strips nulls, compact encoding
  • Logs — collapses repeated lines
  • Test output — shows failures only

What doesn't get compressed:

  • Stack traces, error messages, secrets — routed to safe mode (0% compression)
  • Your prompts and the AI's responses — controlled by the AI tool, not sqz

Supported Tools

Tool Integration Setup
Claude Code PreToolUse hook (transparent) sqz init
Cursor PreToolUse hook (transparent) sqz init
Windsurf PreToolUse hook (transparent) sqz init
Cline PreToolUse hook (transparent) sqz init
Gemini CLI BeforeTool hook (transparent) sqz init
OpenCode TypeScript plugin (transparent) sqz init
VS Code Extension Install from Marketplace
JetBrains Plugin Install from Marketplace
Chrome Browser extension ChatGPT, Claude.ai, Gemini, Grok, Perplexity
Firefox Browser extension Same sites

CLI

sqz init              # Install hooks
sqz compress <text>   # Compress (or pipe from stdin)
sqz gain              # Show daily token savings
sqz stats             # Cumulative report
sqz discover          # Find missed savings
sqz resume            # Re-inject session context after compaction
sqz hook claude       # Process a PreToolUse hook
sqz proxy --port 8080 # API proxy (compresses full request payloads)

Track Your Savings

$ sqz gain
sqz token savings (last 7 days)
──────────────────────────────────────────────────
  04-13 │█████                         │ 2329 saved
  04-14 │                              │ 0 saved
  04-15 │██████████████████████████████│ 12954 saved
  04-16 │████████████                  │ 5532 saved
──────────────────────────────────────────────────
  Total: 1178 compressions, 19214 tokens saved

How Compression Works

  1. Per-command formattersgit status → compact summary, cargo test → failures only, docker ps → name/image/status table
  2. Dedup cache — SHA-256 content hash, persistent across sessions. Second read = 13-token reference.
  3. JSON pipeline — strip nulls → flatten → collapse arrays → TOON encoding (lossless compact format)
  4. Safe mode — stack traces, secrets, migrations detected by entropy analysis and routed through with 0% compression

For the full technical details, see docs/.

Configuration

# ~/.sqz/presets/default.toml
[preset]
name = "default"
version = "1.0"

[compression.condense]
enabled = true
max_repeated_lines = 3

[compression.strip_nulls]
enabled = true

[budget]
warning_threshold = 0.70
default_window_size = 200000

Privacy

  • Zero telemetry — no data transmitted, no crash reports
  • Fully offline — works in air-gapped environments
  • All processing local

Development

git clone https://github.com/ojuschugh1/sqz.git
cd sqz
cargo test --workspace
cargo build --release

License

Elastic License 2.0 (ELv2) — use, fork, modify freely. Two restrictions: no competing hosted service, no removing license notices.

Links