servo-fetch embeds the Servo browser engine into a single binary. It executes JavaScript, computes CSS layout, captures screenshots with a software renderer, and extracts clean content.
Why servo-fetch
- Zero dependencies — single binary, no Chrome, no Docker, no API key
- Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
- Layout-aware extraction — strips navbars, sidebars, footers by actual rendered position, not HTML guessing
- Parallel batch fetch — multiple URLs fetched concurrently, results stream as each completes
- Site crawling — BFS link traversal with robots.txt, same-site scope, and rate limiting
- Screenshots without GPU — software renderer captures PNG/full-page screenshots anywhere
- Accessibility tree — AccessKit integration with roles, names, and bounding boxes
Performance
Parallel fetch — 4 URLs, JS executed, full CSS rendering:
| Tool | Peak Memory | Time |
|---|---|---|
| servo-fetch | 114 MB | 1.5s |
| Playwright | 502 MB | 3.3s |
| Puppeteer | 1065 MB | 4.3s |
Same rendering capabilities, 4–9× less memory, 2–3× faster. Methodology →
Install
|
Or via GitHub Releases, or with Cargo (requires Rust 1.86.0+):
Platform notes
The Linux binary dynamically links against system libraries. Install them with:
# Debian/Ubuntu
# Fedora
# Arch
servo-fetch needs a working OpenGL ES context, so on headless servers (SSH/container) run it under a virtual display:
Windows releases ship as a .zip containing servo-fetch.exe alongside libEGL.dll and libGLESv2.dll — keep them in the same directory. Download from Releases, extract, and put the folder on your PATH.
No runtime dependencies. The release binary is ready to run.
Usage
Examples
# Readable Markdown (default)
# Structured JSON
# Multiple URLs in parallel (Markdown with separators)
# Multiple URLs as NDJSON (one compact JSON per line)
# Screenshot — rendered to PNG without GPU
# Full-page screenshot (captures the entire scrollable page)
# Execute JavaScript in the page context
# Extract a specific section by CSS selector
# Raw HTML or plain text (bypasses Readability)
# PDF text extraction (auto-detected via Content-Type)
# Crawl a site by following links (BFS, respects robots.txt)
# Crawl with path filtering
Options
| Flag | Description |
|---|---|
--json |
Output as structured JSON (NDJSON when multiple URLs) |
--screenshot <FILE> |
Save a PNG screenshot (single URL only) |
--full-page |
Capture the full scrollable page (requires --screenshot) |
--js <EXPR> |
Execute JavaScript and print the result (single URL only) |
--selector <CSS> |
Extract a specific section by CSS selector |
--raw <MODE> |
Output raw html or plain text (single URL only) |
-t, --timeout <SECS> |
Page load timeout (default: 30) |
--settle <MS> |
Extra wait after load event for SPAs (default: 0, max: 10000) |
--help |
Show help |
--version |
Show version |
When multiple URLs are given, they are fetched in parallel. Results stream to stdout in completion order — Markdown with --- URL --- separators by default, or NDJSON with --json.
Crawl subcommand
servo-fetch crawl <URL> follows links within the same site using BFS. Output is always NDJSON (one JSON object per page).
| Flag | Description |
|---|---|
--limit <N> |
Maximum pages to crawl (default: 50) |
--max-depth <N> |
Maximum link depth from seed URL (default: 3) |
--include <GLOB> |
URL path patterns to include (e.g. "/docs/**") |
--exclude <GLOB> |
URL path patterns to exclude |
--json |
Output content as JSON instead of Markdown per page |
--selector <CSS> |
Extract a specific section per page |
-t, --timeout <SECS> |
Per-page timeout (default: 30) |
--settle <MS> |
Extra wait after load event per page |
Crawl respects robots.txt (RFC 9309) and enforces a minimum 500ms interval between requests.
JSON output
--json returns an object with these fields:
| Field | Type | Description |
|---|---|---|
title |
string | Page title |
content |
string | Raw HTML extracted by Readability |
text_content |
string | Readable text (Markdown) |
byline |
string? | Author or byline |
excerpt |
string? | Short excerpt or description |
lang |
string? | Document language (e.g. "en") |
url |
string? | Canonical URL |
Fields marked ? are omitted when not detected.
MCP server
servo-fetch includes a built-in MCP server with five tools — fetch, batch_fetch, crawl, screenshot, and execute_js — over stdio or Streamable HTTP.
For Streamable HTTP transport:
Tools
| Parameter | Type | Description |
|---|---|---|
url |
string | URL to fetch (http/https only) |
format |
string? | markdown (default), json, html, text, or accessibility_tree |
max_length |
number? | Max characters to return (default 5000) |
start_index |
number? | Character offset for pagination (default 0) |
timeout |
number? | Page load timeout in seconds (default 30) |
settle_ms |
number? | Extra wait in ms after load event for SPAs (default 0, max 10000) |
selector |
string? | CSS selector to extract a specific section |
| Parameter | Type | Description |
|---|---|---|
urls |
string[] | URLs to fetch (http/https only, max 20) |
format |
string? | markdown (default) or json |
max_length |
number? | Max characters per URL result (default 5000) |
timeout |
number? | Page load timeout in seconds per URL (default 30) |
settle_ms |
number? | Extra wait in ms after load event (default 0, max 10000) |
selector |
string? | CSS selector to extract a specific section |
| Parameter | Type | Description |
|---|---|---|
url |
string | Starting URL (http/https only) |
limit |
number? | Maximum pages to crawl (default 20, max 500) |
max_depth |
number? | Maximum link depth from seed (default 3, max 10) |
format |
string? | markdown (default) or json |
include_glob |
string[]? | URL path patterns to include |
exclude_glob |
string[]? | URL path patterns to exclude |
max_length |
number? | Max characters per page result (default 5000) |
timeout |
number? | Page load timeout in seconds per page (default 30) |
settle_ms |
number? | Extra wait in ms after load event (default 0, max 10000) |
selector |
string? | CSS selector to extract a specific section per page |
Follows same-site links only. Respects robots.txt. Results stream as each page completes.
| Parameter | Type | Description |
|---|---|---|
url |
string | URL to capture (http/https only) |
full_page |
boolean? | Capture the full scrollable page (default false) |
timeout |
number? | Page load timeout in seconds (default 30) |
settle_ms |
number? | Extra wait in ms after load event (default 0, max 10000) |
| Parameter | Type | Description |
|---|---|---|
url |
string | URL to load before executing JS |
expression |
string | JavaScript expression to evaluate |
timeout |
number? | Page load timeout in seconds (default 30) |
settle_ms |
number? | Extra wait in ms after load event (default 0, max 10000) |
Agent Skills
servo-fetch ships with an Agent Skills package for AI coding agents. Install with npx skills:
Security
servo-fetch blocks all private and reserved IP ranges (RFC 6890), strips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against terminal escape injection (CVE-2021-42574). See SECURITY.md for details.
Limitations
- Servo's web compatibility is improving monthly but does not yet match Chromium. Some SPAs with complex client-side rendering may not fully render.
- Best results on documentation, blogs, news sites, and server-rendered pages.
- Sites behind login walls or CAPTCHAs are not supported.
Contributing
See CONTRIBUTING.md for development setup, commit conventions, and PR guidelines.