servo-fetch

Fetch, render, and extract web content with an embedded Servo browser engine. No Chrome, no containers, no external processes.

Looking for the CLI? See servo-fetch-cli.

Features

Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
Layout-aware extraction — strips navbars, sidebars, footers by rendered position
Sync API — no async runtime required; wrap with spawn_blocking for async contexts
PDF auto-detection — URLs returning PDF are automatically extracted as text
Typed errors — Error::Timeout, Error::InvalidUrl, etc. for match-based retry logic
SSRF protection — blocks private IPs, reserved ranges, and metadata endpoints

Quick Start

let md = servo_fetch::markdown("https://example.com")?;

Examples

Fetch with options

use servo_fetch::{fetch, FetchOptions};
use std::time::Duration;

let page = fetch(
    FetchOptions::new("https://spa-site.com")
        .timeout(Duration::from_secs(60))
        .settle(Duration::from_millis(3000)),
)?;
println!("{}", page.html);
let md = page.markdown()?;

Screenshot

use servo_fetch::{fetch, FetchOptions};

let page = fetch(FetchOptions::screenshot("https://example.com", true))?;
std::fs::write("page.png", page.screenshot_png().unwrap())?;

JavaScript execution

use servo_fetch::{fetch, FetchOptions};

let page = fetch(FetchOptions::javascript("https://example.com", "document.title"))?;
println!("{}", page.js_result.unwrap());

Crawl a site

use servo_fetch::{crawl_each, CrawlOptions};

crawl_each(
    CrawlOptions::new("https://docs.example.com")
        .limit(100)
        .include(&["/docs/**"]),
    |result| println!("{}: {:?}", result.url, result.status),
)?;

Error handling

use servo_fetch::{fetch, FetchOptions, Error};

match fetch(FetchOptions::new(url)) {
    Ok(page) => { /* ... */ }
    Err(e) if e.is_timeout() => { /* retry */ }
    Err(Error::AddressNotAllowed(_)) => { /* skip */ }
    Err(e) => return Err(e.into()),
}

From async contexts

let page = tokio::task::spawn_blocking(|| {
    servo_fetch::fetch(servo_fetch::FetchOptions::new("https://example.com"))
}).await??;

Feature Flags

Flag	Default	Description
`mcp`	off	MCP server support

API Overview

Function	Description
`markdown(url)`	Fetch → readable Markdown
`extract_json(url)`	Fetch → structured JSON
`text(url)`	Fetch → plain text (`innerText`)
`fetch(opts)`	Fetch with full options → `Page`
`crawl(opts)`	Crawl site → `Vec<CrawlResult>`
`crawl_each(opts, cb)`	Crawl site, streaming results

See docs.rs for the full API reference.

License

MIT

servo-fetch 0.6.0