servo-fetch 0.6.0

Fetch, render, and extract web content with an embedded Servo browser engine. No Chrome required.
Documentation

servo-fetch

crates.io docs.rs

Fetch, render, and extract web content with an embedded Servo browser engine. No Chrome, no containers, no external processes.

Looking for the CLI? See servo-fetch-cli.

Features

  • Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
  • Layout-aware extraction — strips navbars, sidebars, footers by rendered position
  • Sync API — no async runtime required; wrap with spawn_blocking for async contexts
  • PDF auto-detection — URLs returning PDF are automatically extracted as text
  • Typed errorsError::Timeout, Error::InvalidUrl, etc. for match-based retry logic
  • SSRF protection — blocks private IPs, reserved ranges, and metadata endpoints

Quick Start

let md = servo_fetch::markdown("https://example.com")?;

Examples

Fetch with options

use servo_fetch::{fetch, FetchOptions};
use std::time::Duration;

let page = fetch(
    FetchOptions::new("https://spa-site.com")
        .timeout(Duration::from_secs(60))
        .settle(Duration::from_millis(3000)),
)?;
println!("{}", page.html);
let md = page.markdown()?;

Screenshot

use servo_fetch::{fetch, FetchOptions};

let page = fetch(FetchOptions::screenshot("https://example.com", true))?;
std::fs::write("page.png", page.screenshot_png().unwrap())?;

JavaScript execution

use servo_fetch::{fetch, FetchOptions};

let page = fetch(FetchOptions::javascript("https://example.com", "document.title"))?;
println!("{}", page.js_result.unwrap());

Crawl a site

use servo_fetch::{crawl_each, CrawlOptions};

crawl_each(
    CrawlOptions::new("https://docs.example.com")
        .limit(100)
        .include(&["/docs/**"]),
    |result| println!("{}: {:?}", result.url, result.status),
)?;

Error handling

use servo_fetch::{fetch, FetchOptions, Error};

match fetch(FetchOptions::new(url)) {
    Ok(page) => { /* ... */ }
    Err(e) if e.is_timeout() => { /* retry */ }
    Err(Error::AddressNotAllowed(_)) => { /* skip */ }
    Err(e) => return Err(e.into()),
}

From async contexts

let page = tokio::task::spawn_blocking(|| {
    servo_fetch::fetch(servo_fetch::FetchOptions::new("https://example.com"))
}).await??;

Feature Flags

Flag Default Description
mcp off MCP server support

API Overview

Function Description
markdown(url) Fetch → readable Markdown
extract_json(url) Fetch → structured JSON
text(url) Fetch → plain text (innerText)
fetch(opts) Fetch with full options → Page
crawl(opts) Crawl site → Vec<CrawlResult>
crawl_each(opts, cb) Crawl site, streaming results

See docs.rs for the full API reference.

License

MIT