servo-fetch embeds the Servo browser engine into a single, lightweight binary. It executes JavaScript via SpiderMonkey, computes CSS layout with Servo's parallel engine, captures screenshots with a software renderer, and extracts clean content — as a CLI tool or an MCP server for AI agents.
Install
|
Or via GitHub Releases, or with Cargo (requires Rust 1.86.0+):
Platform notes
The Linux binary dynamically links against system libraries. Install them with:
# Debian/Ubuntu
# Fedora
# Arch
servo-fetch needs a working OpenGL ES context, so on headless servers (SSH/container) run it under a virtual display:
Windows releases ship as a .zip containing servo-fetch.exe alongside libEGL.dll and libGLESv2.dll — keep them in the same directory. Download from Releases, extract, and put the folder on your PATH.
No runtime dependencies. The release binary is ready to run.
Usage
Examples
# Readable Markdown (default)
# Structured JSON
# Screenshot — rendered to PNG without GPU
# Execute JavaScript in the page context
# Extract a specific section by CSS selector
# Raw HTML or plain text (bypasses Readability)
# PDF text extraction (auto-detected via Content-Type)
Options
| Flag | Description |
|---|---|
--json |
Output as structured JSON |
--screenshot <FILE> |
Save a PNG screenshot |
--js <EXPR> |
Execute JavaScript and print the result |
--selector <CSS> |
Extract a specific section by CSS selector |
--raw <MODE> |
Output raw html or plain text (bypasses Readability) |
-t, --timeout <SECS> |
Page load timeout (default: 30) |
--help |
Show help |
--version |
Show version |
JSON output
--json returns an object with these fields:
| Field | Type | Description |
|---|---|---|
title |
string | Page title |
content |
string | Raw HTML extracted by Readability |
text_content |
string | Readable text (Markdown) |
byline |
string? | Author or byline |
excerpt |
string? | Short excerpt or description |
lang |
string? | Document language (e.g. "en") |
url |
string? | Canonical URL |
Fields marked ? are omitted when not detected.
Why servo-fetch
Servo is a real browser engine. Written in Rust by the Servo project, Servo executes JavaScript via SpiderMonkey and computes CSS layout with a parallel engine. servo-fetch embeds this engine so you get browser-grade rendering without a browser runtime.
CSS layout strips navigation noise. Most extraction tools guess page structure from HTML tags. servo-fetch calls getComputedStyle() and getBoundingClientRect() inside the engine to detect fixed navbars, sidebars, and footers — then removes them before extraction. Common cookie banners and newsletter popups are also stripped via injected user stylesheets.
Accessibility tree with bounding boxes. servo-fetch can return the page's accessibility tree via Servo's AccessKit integration. Each node includes its role, name, and bounding box — combining semantic structure with visual layout in a single output. Use format: "accessibility_tree" in the MCP fetch tool.
Main content via Readability. After CSS-based structure removal, Mozilla's Readability algorithm extracts the main article. PDF URLs are auto-detected via Content-Type and extracted directly without the Servo engine.
MCP server
servo-fetch includes a built-in MCP server with three tools — fetch, screenshot, and execute_js — over stdio or Streamable HTTP.
For Streamable HTTP transport:
The fetch tool accepts these parameters:
| Parameter | Type | Description |
|---|---|---|
url |
string | URL to fetch (http/https only) |
format |
string? | markdown (default), json, html, text, or accessibility_tree |
max_length |
number? | Max characters to return (default 5000) |
start_index |
number? | Character offset for pagination (default 0) |
timeout |
number? | Page load timeout in seconds (default 30) |
selector |
string? | CSS selector to extract a specific section |
Agent Skills
servo-fetch ships with an Agent Skills package for AI coding agents. Install with npx skills:
PDF URLs are auto-detected via Content-Type and extracted directly without Servo.
Security
servo-fetch blocks all private and reserved IP ranges (RFC 6890), strips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against terminal escape injection (CVE-2021-42574). See SECURITY.md for details.
Limitations
- Best suited for documentation, blogs, and SSR sites
- Some SPAs with complex client-side rendering may not fully render
- Servo's web compatibility is improving monthly
Contributing
See CONTRIBUTING.md for development setup, commit conventions, and PR guidelines.