vision-squeezer 0.1.7

LLM-native image optimization middleware & MCP server. Reduces vision model token consumption by snapping to tile boundaries.
Documentation

VisionSqueezer

LLM-native image optimization middleware & MCP server. Reduces vision model token consumption by preprocessing images into tile-boundary-aligned, padding-free formats.

Works with any agent or editor that speaks MCP — Claude, GPT, Gemini, Codex, or your own.

If you send raw images to an LLM, you are leaking tokens. Modern vision models do not care about your file size (MB/KB); they only care about pixel dimensions, but each provider calculates costs completely differently. vision-squeezer simulates these algorithms to find the mathematical minimum size that drops your token usage without losing visual context.

1. Claude (Area-Based)

As of 2026 (Claude 3.5 / 4.5+), Anthropic uses an area-based formula: Tokens ≈ (Width × Height) / 750. Every single pixel of solid background or padding costs you tokens.

  • The Fix: vision-squeezer aggressively crops padding (removing solid color borders). A 1025×1025 screenshot shrinks just enough to drop from 1,400 tokens to 1,024 tokens (%26 savings).

2. GPT-4.5 / GPT-4o (Tiling & Short-side Scaling)

OpenAI scales your image to fit inside a 2048px box, then rescales it again so the shortest side is exactly 768px. Finally, it chops the image into a grid of 512×512 tiles. Each tile costs 170 tokens.

  • The Fix: If your image's shortest side ends up being 769px, OpenAI will spill over into an entirely new row of 512×512 tiles, doubling your cost. vision-squeezer simulates this exact math and snaps the image down by a few pixels so it fits perfectly into the minimum number of tiles.

3. Gemini 2.0 / 3.0 (Massive Tiles)

Gemini uses a massive 768×768 tile system (if the image is > 384px). Each tile is a flat 258 tokens.

  • The Fix: An 800×600 image will trigger a 2×1 tile grid (1,032 tokens). vision-squeezer snaps it down slightly to fit exactly inside a 768×768 box, dropping the cost to 258 tokens (%75 savings).

Pipeline

Input image
  → crop_padding               remove solid-color borders
  → calculate_optimal_dims     snap to tile boundary (always down)
  → [enforce_max_tiles]        optional tile budget cap
  → resize_exact               Lanczos3
  → [binarize]                 OCR mode only: Otsu threshold
  → JPEG/WebP encode           configurable quality & format

🦀 Why Rust?

VisionSqueezer is a performance-critical middleware. We chose Rust for three uncompromising reasons:

  • Invisible Latency: Image processing should never be the bottleneck. Rust ensures that snapping, cropping, and encoding happen in milliseconds, making the optimization layer truly invisible to the developer's workflow.
  • Minimal Footprint: As an MCP server running in the background of your IDE, VisionSqueezer is designed to be ultra-lightweight, consuming near-zero CPU and RAM when idle.
  • Wasm-Ready: Rust’s first-class support for WebAssembly allows us to bring the same high-performance optimization to the browser and the Edge (Cloudflare Workers), enabling client-side squeezing before the image even hits the network.

Case Study 1: Standard Image (istanbul.jpg)

To demonstrate the impact on standard images, here is the run on a 2400×1670 image (4 MP, 0.5 MB) across the three scenarios:

Example 1: Agnostic Optimization (Default)

When no target model is specified, Squeezer reduces the file size and mathematically optimizes boundaries to be generally efficient across all models.

vision-squeezer data/istanbul.jpg
Input:  2400×1670  (0.5 MB)
Output: 2048×1536  (0.3 MB, JPG q75)
File:   28.6% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude           5344     4194     1150 (21.5%)
GPT-4o           1105      765      340 (30.8%)
GPT-5            1536     1536        0 (0.0%)
Gemini           3096     1548     1548 (50.0%)
────────────────────────────────────────────────────────────

Example 2: Model-Targeted Optimization (GPT-4o)

If you tell Squeezer the target model, it reverses the model's exact internal calculation (e.g. GPT-4.5's 768px short-side scaling algorithm) and mathematically shrinks the image just enough to fit the absolute minimum tile grid.

vision-squeezer data/istanbul.jpg --model gpt4o
Input:  2400×1670  (0.5 MB)
Output: 2399×1200  (0.3 MB, JPG q75)
File:   33.6% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude           5344     3838     1506 (28.2%)
GPT-4o           1105     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           3096     2064     1032 (33.3%)
────────────────────────────────────────────────────────────

Notice how targeting gpt4o perfectly fits the image into a solid 6-tile boundary (2399x1200) mathematically calculated backwards from OpenAI's short-side scaling algorithm. It maximizes resolution exactly up to the point where an extra tile would be billed.

Example 3: Model-Targeted Optimization (Claude)

Since Claude uses an area-based calculation (W × H / 750), Squeezer primarily focuses on aggressively cropping solid-color borders and padding to shrink the pixel area without drastically downscaling the core visual detail.

vision-squeezer data/istanbul.jpg --model claude
Input:  2400×1670  (0.5 MB)
Output: 2304×1536  (0.4 MB, JPG q75)
File:   21.3% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude           5344     4718      626 (11.7%)
GPT-4o           1105     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           3096     1548     1548 (50.0%)
────────────────────────────────────────────────────────────

Claude benefits tremendously from even minor dimension reductions. By snapping the width and height slightly downwards, we immediately shaved off over 600 tokens while preserving the massive 2304×1536 resolution.

Case Study 2: 12-Megapixel High-Res Image (istanbul2.jpg)

To demonstrate the impact on massive images, here is the run on a 4096×3072 image (12 MP, 2.2 MB) across the three scenarios:

Example 1: Agnostic Optimization

vision-squeezer data/istanbul2.jpg
Input:  4096×3072  (2.2 MB)
Output: 3584×2560  (1.3 MB, JPG q75)
File:   39.6% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude          16777    12233     4544 (27.1%)
GPT-4o            765     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           6192     5160     1032 (16.7%)
────────────────────────────────────────────────────────────

(Notice the OpenAI Aspect Ratio Anomaly: Squeezer removed heavy letterboxing (padding) from this image. By removing the padding, the image became "wider". Because OpenAI's API forces the new short side to 768px, the wide aspect ratio pushed the long side into a 3rd tile grid column! This is a fascinating edge case where cropping padding mathematically INCREASES your GPT-4o token cost. If you specifically use --model gpt4o on this image, Squeezer will detect this paradox and use a different grid constraint).

Example 2: Model-Targeted Optimization (GPT-4o)

vision-squeezer data/istanbul2.jpg --model gpt4o
Input:  4096×3072  (2.2 MB)
Output: 4095×2048  (1.2 MB, JPG q75)
File:   43.2% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude          16777    11182     5595 (33.3%)
GPT-4o            765     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           6192     4644     1548 (25.0%)
────────────────────────────────────────────────────────────

(By explicitly targeting gpt4o, Squeezer optimizes the boundaries such that the new aspect ratio is safely contained. While GPT-4o still bills for the 6-tile layout due to the image's inherent width, Squeezer shrinks the file footprint by 43% without sacrificing high-resolution details.)

Example 3: Model-Targeted Optimization (Claude)

vision-squeezer data/istanbul2.jpg --model claude
Input:  4096×3072  (2.2 MB)
Output: 3840×2816  (1.5 MB, JPG q75)
File:   31.5% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude          16777    14417     2360 (14.1%)
GPT-4o            765     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           6192     5160     1032 (16.7%)
────────────────────────────────────────────────────────────

(Claude's area-based formula again allows massive token savings simply by trimming to the 3840×2816 boundary, preventing you from paying for over 2,300 tokens of pure padding while retaining 10+ megapixels of fidelity).

💡 FAQ: Wait, why did targeting gpt4o save 33% of Claude tokens, but targeting claude only saved 14%? Because of the Quality vs. Aggression trade-off. OpenAI enforces a strict maximum internal resolution (2048px). When you target gpt4o, Squeezer must aggressively squash the massive 4096px image down to fit OpenAI's constraints (4095x2048). This massive loss in total pixel area mathematically translates to a huge token drop for Claude. However, Claude has no such maximum limits. When you explicitly target claude, Squeezer knows it doesn't need to destroy your image's resolution. It carefully keeps the massive 3840x2816 size to preserve ultra-fine detail, only trimming the absolute minimum padding to give you the most cost-efficient lossless version possible.


Install

Claude Code (one-liner)

claude mcp add vision-squeezer -- npx -y vision-squeezer

Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json:

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Cursor

cursor --add-mcp '{"name":"vision-squeezer","type":"stdio","command":"npx","args":["-y","vision-squeezer"]}'

Or add to .cursor/mcp.json:

{
  "servers": {
    "vision-squeezer": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

VS Code Copilot

Add to .vscode/mcp.json:

{
  "servers": {
    "vision-squeezer": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

JetBrains (IntelliJ, WebStorm, PyCharm)

Open Tools → GitHub Copilot → Model Context Protocol (MCP) → Configure, then add:

{
  "servers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Gemini CLI

Add to ~/.gemini/settings.json:

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Codex CLI

codex mcp add vision-squeezer -- npx -y vision-squeezer

Or add to ~/.codex/config.toml:

[mcp_servers.vision-squeezer]
command = "npx"
args = ["-y", "vision-squeezer"]

Qwen Code

qwen mcp add vision-squeezer -- npx -y vision-squeezer

Zed

Add to ~/.config/zed/settings.json:

{
  "context_servers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Kiro

Add to .kiro/settings/mcp.json (workspace) or ~/.kiro/settings/mcp.json (global):

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Antigravity

MCP-only, no hooks needed. Configure via the Antigravity MCP settings:

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Manual install (Rust binary)

# From crates.io
cargo install vision-squeezer

# Or from source
git clone https://github.com/eralpozcan/vision-squeezer && cd vision-squeezer
make install   # builds → ~/.local/bin/

Then use the binary path directly in any config above instead of npx:

{ "command": "vision-squeezer-mcp" }

Tip: Run npx -y vision-squeezer --setup to print ready configs with auto-detected paths.


CLI Usage

vision-squeezer path/to/image.jpg \
  --mode auto|ocr|standard \ # default: auto (detects text/grayscale)
  --format jpeg|webp \     # default: jpeg
  --quality 85 \           # output quality 1-100 (default: 75)
  --tile-size 256 \        # patch size in px (default: 512)
  --no-crop \              # disable padding removal
  --bg-tolerance 25 \      # background detection 0-255 (default: 15)
  --model claude|gpt4o|gpt5|gemini \  # model-aware resizing
  --max-tiles 20           # hard cap on tile count

MCP Tool: optimize_image

Argument Type Required Default
image_base64 string
mode "auto" | "standard" | "ocr" "auto"
output_format "jpeg" | "webp" "jpeg"
quality integer 1–100 75
tile_size integer 512
crop boolean true
bg_tolerance integer 0–255 15
max_tiles integer
target_model "claude" | "gpt4o" | "gpt5" | "gemini"

Response:

{
  "optimized_base64": "...",
  "savings_report": {
    "tiles_before": 48,
    "tiles_after": 35,
    "tiles_saved": 13,
    "token_reduction_pct": "27.1",
    "size_reduction_pct": "58.9"
  }
}

Config Reference

Parameter Type Default Description
quality u8 1–100 75 JPEG/WebP output quality
tile_size u32 512 Model patch size (512 = Claude/GPT, 256 = Gemini)
crop bool true Remove solid-color padding borders
bg_tolerance u8 0–255 15 Max channel delta for background detection
output_format jpeg/webp jpeg Output encoding. WebP is ~30-50% smaller
max_tiles u32 Hard cap on tile count (progressive downscale)
target_model string Model-aware: claude, gpt4o, gpt5, gemini

Supported Models

Model Tile Size Pre-scaling Token Formula
Claude 3.5/4.5/4.7 N/A None Tokens ≈ (W × H) / 750
GPT-4o / GPT-4.5 512×512 fit 2048px → scale short 768px 85 + tiles × 170
GPT-5/5.5 512×512 fit 6000px / 10.24M px min(85 + tiles × 170, 1536)
Gemini 2.0/3.0 768×768 > 384x384 → fit 4096px 258 per tile (flat 258 if small)

Benchmark / Savings

Real-world token consumption before and after vision-squeezer (using standard photos and screenshots without --max-tiles). Calculations use updated 2026 billing formulas.

Original Size Model Tokens Before Tokens After Saved
1025 × 1025(Screenshot) Claude 4.5+GPT-4.5Gemini 2.0+ 1,4004251,032 1,024255258 26.8%40.0%75.0%
4032 × 3024(Phone Camera) Claude 4.5+GPT-4.5Gemini 2.0+ 16,2572,1256,192 12,2321,7454,128 24.8%17.9%33.3%
800 × 600(Web Image) Claude 4.5+GPT-4.5Gemini 2.0+ 6402551,032 341255258 46.7%0.0%75.0%

(Note: GPT-5's high limits mean it rarely requires tiling optimization unless the image exceeds 6000px, but vision-squeezer will still crop padding and compress the file size dramatically).

Advanced Features

Persistence & Analytics

VisionSqueezer tracks every optimization locally in ~/.vision-squeezer/stats.db.

vision-squeezer stats
── VisionSqueezer Analytics ────────────────────────────────
Total Optimizations: 42
Total Tokens Saved:  842,500
Total Bytes Saved:   156.40 MB
Estimated USD Saved: $2.11
────────────────────────────────────────────────────────────

Shell Hook & Claude Code Skills

Add to .zshrc / .bashrc:

eval "$(vision-squeezer setup-hook)"

This installs two things at once:

  • squeeze alias — optimize and capture the output path in one step:
    img_path=$(squeeze data/logo.png --model gemini)
    
  • Claude Code skills — written to ~/.claude/skills/ automatically on first run

Available Skills

Skill Trigger What it does
vision-stats /vision-stats Show cumulative token & byte savings — reads local stats.db, zero MCP overhead
vision-doctor /vision-doctor Check installed version vs latest npm release, show update command if outdated

Example output:

/vision-stats
── VisionSqueezer Analytics ────────────────────────────────
Total Optimizations: 42
Total Tokens Saved:  842,500
Total Bytes Saved:   156.40 MB
Estimated USD Saved: $2.11
────────────────────────────────────────────────────────────

/vision-doctor
## VisionSqueezer Doctor
- [x] Binary found: /usr/local/bin/vision-squeezer
- [x] Installed version: 0.1.1
- [x] Latest version (npm): 0.1.1
- [x] Status: Up to date

Marketplace install (alternative to setup-hook): Add to ~/.claude/settings.json:

{
  "extraKnownMarketplaces": {
    "vision-squeezer": {
      "source": { "source": "github", "repo": "eralpozcan/vision-squeezer" }
    }
  }
}

Then run /plugins add vision-stats@vision-squeezer or /plugins add vision-doctor@vision-squeezer in Claude Code.

Sandbox Mode: "Think in Code"

Execute atomic operations locally before the image ever reaches an LLM. The agent decides how to process the image; you pay zero tokens for the intermediate steps.

CLI:

vision-squeezer screenshot.png --ops '[{"op":"crop","x":10,"y":20,"width":500,"height":500},{"op":"binarize"}]'

MCP tool — sandbox_execute:

Argument Type Description
image_base64 string Input image
operations array Ordered list of ops to apply

Supported ops: crop, grayscale, binarize, resize, contrast, brightness.

Crawler Integration

Optimize images on the fly in any Playwright/Puppeteer scraping pipeline — zero API token waste on raw screenshots.

await page.route('**/*.{png,jpg}', async (route) => {
  const response = await route.fetch();
  const body = await response.body();
  const optimized = await squeeze(body); // pipe through vision-squeezer
  route.fulfill({ body: optimized });
});

Contributing

PRs welcome. See CONTRIBUTING.md for setup and guidelines. Please follow our Code of Conduct.

MCP Installation

# Add to Claude Code or Cursor configuration
claude mcp add vision-squeezer -- npx -y vision-squeezer

Setup & Compilation

cargo build --release

License

Elastic License 2.0 (ELv2) — See LICENSE for details.