sqz-cli 0.7.0

sqz-cli-0.7.0 is not a library.

sqz compresses command output before it reaches your LLM. Single Rust binary, zero config.

The real win is dedup: when the same file gets read 5 times in a session, sqz sends it once and returns a 13-token reference for every repeat.

Without sqz:                    With sqz:

File read #1:  2,000 tokens     File read #1:  ~800 tokens (compressed)
File read #2:  2,000 tokens     File read #2:  ~13 tokens  (dedup ref)
File read #3:  2,000 tokens     File read #3:  ~13 tokens  (dedup ref)
───────────────────────         ───────────────────────
Total:         6,000 tokens     Total:         ~826 tokens (86% saved)

Token Savings

Single-command compression (measured via cargo test -p sqz-engine benchmarks):

Content	Before	After	Saved
Repeated log lines	148	62	58%
Large JSON array	259	142	45%
JSON API response	64	53	17%
Git diff	61	54	12%
Prose/docs	124	121	2%
Stack trace (safe mode)	82	82	0%

Session-level savings (with dedup cache across repeated reads):

Scenario	Without sqz	With sqz	Saved
Same file read 5x	10,000	826	92%
Same JSON response 3x	192	79	59%
Test-fix-test cycle (3 runs)	15,000	5,186	65%

The dedup cache is where the real savings live. Single-command compression ranges from 2-58% depending on content. Repeated reads drop to 13 tokens each.

Install

cargo install sqz-cli

Then:

sqz init

That's it. Shell hooks installed, AI tool hooks configured.

How It Works

sqz installs a PreToolUse hook that intercepts bash commands before your AI tool runs them. The output gets compressed transparently — the AI tool never knows.

Claude → git status → [sqz hook rewrites] → compressed output (85% smaller)

What gets compressed:

Shell output — git, cargo, npm, docker, kubectl, ls, grep, etc.
JSON — strips nulls, compact encoding
Logs — collapses repeated lines
Test output — shows failures only

What doesn't get compressed:

Stack traces, error messages, secrets — routed to safe mode (0% compression)
Your prompts and the AI's responses — controlled by the AI tool, not sqz

Supported Tools

Tool	Integration	Setup
Claude Code	PreToolUse hook (transparent)	`sqz init`
Cursor	PreToolUse hook (transparent)	`sqz init`
Windsurf	PreToolUse hook (transparent)	`sqz init`
Cline	PreToolUse hook (transparent)	`sqz init`
Gemini CLI	BeforeTool hook (transparent)	`sqz init`
OpenCode	TypeScript plugin (transparent)	`sqz init`
VS Code	Extension	Install from Marketplace
JetBrains	Plugin	Install from Marketplace
Chrome	Browser extension	ChatGPT, Claude.ai, Gemini, Grok, Perplexity
Firefox	Browser extension	Same sites

CLI

sqz init              # Install hooks
sqz compress <text>   # Compress (or pipe from stdin)
sqz compact           # Evict stale context to free tokens
sqz gain              # Show daily token savings
sqz stats             # Cumulative report
sqz discover          # Find missed savings
sqz resume            # Re-inject session context after compaction
sqz hook claude       # Process a PreToolUse hook
sqz proxy --port 8080 # API proxy (compresses full request payloads)

Track Your Savings

$ sqz gain
sqz token savings (last 7 days)
──────────────────────────────────────────────────
  04-13 │█████                         │ 2329 saved
  04-14 │                              │ 0 saved
  04-15 │██████████████████████████████│ 12954 saved
  04-16 │████████████                  │ 5532 saved
──────────────────────────────────────────────────
  Total: 1178 compressions, 19214 tokens saved

How Compression Works

Per-command formatters — git status → compact summary, cargo test → failures only, docker ps → name/image/status table
Structural summaries — code files compressed to imports + function signatures + call graph (~70% reduction). The model sees the architecture, not implementation noise.
Dedup cache — SHA-256 content hash, persistent across sessions. Second read = 13-token reference.
JSON pipeline — strip nulls → project out debug fields → flatten → collapse arrays → TOON encoding (lossless compact format)
Safe mode — stack traces, secrets, migrations detected by entropy analysis and routed through with 0% compression

For the full technical details, see docs/.

Configuration

# ~/.sqz/presets/default.toml
[preset]
name = "default"
version = "1.0"

[compression.condense]
enabled = true
max_repeated_lines = 3

[compression.strip_nulls]
enabled = true

[budget]
warning_threshold = 0.70
default_window_size = 200000

Privacy

Zero telemetry — no data transmitted, no crash reports
Fully offline — works in air-gapped environments
All processing local

Development

git clone https://github.com/ojuschugh1/sqz.git
cd sqz
cargo test --workspace
cargo build --release

License

Elastic License 2.0 (ELv2) — use, fork, modify freely. Two restrictions: no competing hosted service, no removing license notices.