servo-fetch 0.4.0

A browser engine in a binary. Fetch, render, and extract web content powered by Servo.
Documentation

servo-fetch embeds the Servo browser engine into a single binary. It executes JavaScript, computes CSS layout, captures screenshots with a software renderer, and extracts clean content.

servo-fetch "https://example.com"                        # Clean Markdown
servo-fetch "https://example.com" --screenshot page.png  # PNG screenshot, no GPU needed
servo-fetch "https://example.com" --js "document.title"  # Run JS in the page
servo-fetch URL1 URL2 URL3                               # Parallel batch fetch

Why servo-fetch

  • Zero dependencies — single binary, no Chrome, no Docker, no API key
  • Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
  • Layout-aware extraction — strips navbars, sidebars, footers by actual rendered position, not HTML guessing
  • Parallel batch fetch — multiple URLs fetched concurrently, results stream as each completes
  • Screenshots without GPU — software renderer captures PNG/full-page screenshots anywhere
  • Accessibility tree — AccessKit integration with roles, names, and bounding boxes

Install

curl -fsSL https://raw.githubusercontent.com/konippi/servo-fetch/main/install.sh | sh

Or via GitHub Releases, or with Cargo (requires Rust 1.86.0+):

cargo binstall servo-fetch   # prebuilt binary
cargo install servo-fetch    # build from source

Platform notes

The Linux binary dynamically links against system libraries. Install them with:

# Debian/Ubuntu
sudo apt install -y libegl1 libfontconfig1 libfreetype6

# Fedora
sudo dnf install -y mesa-libEGL fontconfig freetype

# Arch
sudo pacman -S --needed mesa fontconfig freetype2

servo-fetch needs a working OpenGL ES context, so on headless servers (SSH/container) run it under a virtual display:

xvfb-run --auto-servernum servo-fetch "https://example.com"

Windows releases ship as a .zip containing servo-fetch.exe alongside libEGL.dll and libGLESv2.dll — keep them in the same directory. Download from Releases, extract, and put the folder on your PATH.

No runtime dependencies. The release binary is ready to run.

Usage

Examples

# Readable Markdown (default)
servo-fetch "https://example.com"

# Structured JSON
servo-fetch "https://example.com" --json

# Multiple URLs in parallel (Markdown with separators)
servo-fetch "https://a.com" "https://b.com" "https://c.com"

# Multiple URLs as NDJSON (one compact JSON per line)
servo-fetch "https://a.com" "https://b.com" --json

# Screenshot — rendered to PNG without GPU
servo-fetch "https://example.com" --screenshot page.png

# Full-page screenshot (captures the entire scrollable page)
servo-fetch "https://example.com" --screenshot page.png --full-page

# Execute JavaScript in the page context
servo-fetch "https://example.com" --js "document.title"

# Extract a specific section by CSS selector
servo-fetch "https://example.com" --selector "article"

# Raw HTML or plain text (bypasses Readability)
servo-fetch "https://example.com" --raw html
servo-fetch "https://example.com" --raw text

# PDF text extraction (auto-detected via Content-Type)
servo-fetch "https://example.com/report.pdf"

Options

Flag Description
--json Output as structured JSON (NDJSON when multiple URLs)
--screenshot <FILE> Save a PNG screenshot (single URL only)
--full-page Capture the full scrollable page (requires --screenshot)
--js <EXPR> Execute JavaScript and print the result (single URL only)
--selector <CSS> Extract a specific section by CSS selector
--raw <MODE> Output raw html or plain text (single URL only)
-t, --timeout <SECS> Page load timeout (default: 30)
--settle <MS> Extra wait after load event for SPAs (default: 0, max: 10000)
--help Show help
--version Show version

When multiple URLs are given, they are fetched in parallel. Results stream to stdout in completion order — Markdown with --- URL --- separators by default, or NDJSON with --json.

JSON output

--json returns an object with these fields:

Field Type Description
title string Page title
content string Raw HTML extracted by Readability
text_content string Readable text (Markdown)
byline string? Author or byline
excerpt string? Short excerpt or description
lang string? Document language (e.g. "en")
url string? Canonical URL

Fields marked ? are omitted when not detected.

MCP server

servo-fetch includes a built-in MCP server with four tools — fetch, batch_fetch, screenshot, and execute_js — over stdio or Streamable HTTP.

{
  "mcpServers": {
    "servo-fetch": {
      "command": "servo-fetch",
      "args": ["mcp"]
    }
  }
}

For Streamable HTTP transport:

servo-fetch mcp --port 8080

Tools

fetch

Fetch a URL and extract readable content. Navbars, sidebars, and footers are stripped automatically using CSS layout analysis.

Parameter Type Description
url string URL to fetch (http/https only)
format string? markdown (default), json, html, text, or accessibility_tree
max_length number? Max characters to return (default 5000)
start_index number? Character offset for pagination (default 0)
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event for SPAs (default 0, max 10000)
selector string? CSS selector to extract a specific section

batch_fetch

Fetch multiple URLs in parallel. Results are returned as separate content entries in completion order.

Parameter Type Description
urls string[] URLs to fetch (http/https only, max 20)
format string? markdown (default) or json
max_length number? Max characters per URL result (default 5000)
timeout number? Page load timeout in seconds per URL (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)
selector string? CSS selector to extract a specific section

screenshot

Capture a PNG screenshot using Servo's software renderer — no GPU required.

Parameter Type Description
url string URL to capture (http/https only)
full_page boolean? Capture the full scrollable page (default false)
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)

execute_js

Evaluate a JavaScript expression in a loaded page. Console messages are appended to the result.

Parameter Type Description
url string URL to load before executing JS
expression string JavaScript expression to evaluate
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)

Agent Skills

servo-fetch ships with an Agent Skills package for AI coding agents. Install with npx skills:

npx skills add https://github.com/konippi/servo-fetch/tree/main/skills/servo-fetch

Security

servo-fetch blocks all private and reserved IP ranges (RFC 6890), strips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against terminal escape injection (CVE-2021-42574). See SECURITY.md for details.

Limitations

  • Servo's web compatibility is improving monthly but does not yet match Chromium. Some SPAs with complex client-side rendering may not fully render.
  • Best results on documentation, blogs, news sites, and server-rendered pages.
  • Sites behind login walls or CAPTCHAs are not supported.

Contributing

See CONTRIBUTING.md for development setup, commit conventions, and PR guidelines.

License

MIT