servo-fetch-cli 0.7.1

A browser engine in a binary — fetch, render, and extract web content powered by Servo.
servo-fetch-cli-0.7.1 is not a library.

servo-fetch-cli

crates.io

A browser engine in a binary — fetch, render, and extract web content powered by Servo.

For programmatic use in Rust, see the servo-fetch library crate.

Install

Pre-built binaries (recommended)

curl -fsSL https://raw.githubusercontent.com/konippi/servo-fetch/main/install.sh | sh

Cargo

cargo binstall servo-fetch-cli   # prebuilt binary via cargo-binstall
cargo install servo-fetch-cli    # build from source (requires Rust 1.86.0+)

Usage

Extract content

servo-fetch "https://example.com"              # Readable Markdown (default)
servo-fetch "https://example.com" --json       # Structured JSON
servo-fetch "https://example.com" --raw html   # Raw rendered HTML
servo-fetch "https://example.com" --raw text   # Plain text (innerText)

Batch fetch

servo-fetch URL1 URL2 URL3                     # Parallel fetch, Markdown output
servo-fetch URL1 URL2 --json                   # Parallel fetch, NDJSON output

Screenshots

servo-fetch "https://example.com" --screenshot page.png
servo-fetch "https://example.com" --screenshot full.png --full-page

JavaScript execution

servo-fetch "https://example.com" --js "document.title"
servo-fetch "https://example.com" --js "document.querySelectorAll('h2').length"

CSS selector extraction

servo-fetch "https://example.com" --selector "article"
servo-fetch "https://example.com" --selector ".main-content" --json

Crawl a site

servo-fetch crawl "https://docs.example.com" --limit 20
servo-fetch crawl "https://docs.example.com" --include "/docs/**" --exclude "/docs/archive/**"
servo-fetch crawl "https://docs.example.com" --json --max-depth 5

SPA / dynamic content

servo-fetch "https://spa.example.com" --settle 3000       # Wait 3s after load for hydration
servo-fetch "https://spa.example.com" -t 60 --settle 5000 # 60s timeout + 5s settle

MCP server

servo-fetch mcp                # stdio transport (for AI agents)
servo-fetch mcp --port 8080    # Streamable HTTP transport

Options

Flag Description
--json Structured JSON output (NDJSON for multiple URLs)
--screenshot <FILE> Save PNG screenshot
--full-page Capture full scrollable page (requires --screenshot)
--js <EXPR> Execute JavaScript and print result
--selector <CSS> Extract specific section by CSS selector
--raw html|text Raw HTML or plain text output
-t, --timeout <SECS> Page load timeout in seconds (default: 30)
--settle <MS> Extra wait after load event in ms (default: 0, max: 10000)
--user-agent <UA> Override the User-Agent string
-v, --verbose Increase log verbosity (-v info, -vv debug, -vvv trace)
-q, --quiet Suppress all logs except errors

JSON output

--json returns an object with these fields:

Field Type Description
title string Page title
content string Raw HTML extracted by Readability
text_content string Readable text (Markdown)
byline string Author or byline (omitted if not detected)
excerpt string Short excerpt or description (omitted if not detected)
lang string Document language (omitted if not detected)
url string Canonical URL (omitted if not detected)

Crawl subcommand

servo-fetch crawl <URL> follows same-site links using BFS. Respects robots.txt (RFC 9309) with a minimum 500ms interval.

Flag Description
--limit <N> Maximum pages to crawl (default: 50)
--max-depth <N> Maximum link depth (default: 3)
--include <GLOB> URL path patterns to include
--exclude <GLOB> URL path patterns to exclude
--json Output content as JSON per page
--selector <CSS> Extract specific section per page
--user-agent <UA> Override the User-Agent string
-t, --timeout <SECS> Page load timeout in seconds per page (default: 30)
--settle <MS> Extra wait after load event in ms per page (default: 0, max: 10000)

Logging

Diagnostic messages go to stderr; stdout is reserved for data output so pipes stay clean.

servo-fetch -v "https://example.com"                       # info and above
servo-fetch -vv "https://example.com"                      # debug
servo-fetch -vvv "https://example.com"                     # trace
servo-fetch -q "https://example.com"                       # errors only
RUST_LOG="servo_fetch=debug" servo-fetch "https://..."     # fine-grained override
RUST_LOG="servo_fetch=trace,servo=debug" servo-fetch "..." # include Servo internals

RUST_LOG uses tracing-subscriber's directive syntax and always wins over CLI flags.

Environment Variables

Variable Description
SERVO_FETCH_USER_AGENT Default User-Agent string (overridden by --user-agent)
RUST_LOG Fine-grained log filter (overrides -v/-q)

MCP Server

Built-in Model Context Protocol server over stdio or Streamable HTTP.

{
  "mcpServers": {
    "servo-fetch": {
      "command": "servo-fetch",
      "args": ["mcp"]
    }
  }
}

Streamable HTTP: servo-fetch mcp --port 8080

Parameter Type Description
url string URL to fetch (http/https only)
format string? markdown (default), json, html, text, or accessibility_tree
max_length number? Max characters to return (default 5000)
start_index number? Character offset for pagination (default 0)
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)
selector string? CSS selector to extract a specific section
Parameter Type Description
urls string[] URLs to fetch (http/https only, max 20)
format string? markdown (default) or json
max_length number? Max characters per URL result (default 5000)
timeout number? Page load timeout in seconds per URL (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)
selector string? CSS selector to extract a specific section
Parameter Type Description
url string Starting URL (http/https only)
limit number? Maximum pages to crawl (default 20, max 500)
max_depth number? Maximum link depth from seed (default 3, max 10)
format string? markdown (default) or json
include_glob string[]? URL path patterns to include
exclude_glob string[]? URL path patterns to exclude
max_length number? Max characters per page result (default 5000)
timeout number? Page load timeout in seconds per page (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)
selector string? CSS selector to extract a specific section per page
Parameter Type Description
url string URL to capture (http/https only)
full_page boolean? Capture the full scrollable page (default false)
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)
Parameter Type Description
url string URL to load before executing JS
expression string JavaScript expression to evaluate
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)