servo-fetch
Fetch, render, and extract web content as Markdown, JSON, or screenshots with an embedded Servo browser engine. No Chromium, no containers, no external processes.
Looking for the CLI? See servo-fetch-cli.
Features
- Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
- Layout-aware extraction — strips navbars, sidebars, footers by rendered position
- Schema-driven JSON — declarative CSS-selector schema pulls structured data, no LLM
- Sync API — no async runtime required; wrap with
spawn_blockingfor async contexts - PDF auto-detection — URLs returning PDF are automatically extracted as text
- Typed errors —
Error::Timeout,Error::InvalidUrl, etc. for match-based retry logic - SSRF protection — blocks private IPs, reserved ranges, and metadata endpoints
Quick Start
let md = markdown?;
Examples
Fetch with options
use ;
use Duration;
let page = fetch?;
println!;
let md = page.markdown?;
Screenshot
use ;
let page = fetch?;
write?;
JavaScript execution
use ;
let page = fetch?;
println!;
Crawl a site
use ;
crawl_each?;
Schema-driven JSON extraction
use ;
use ExtractSchema;
// Load a schema from a file...
let product_schema = from_path?;
// ...or from an inline string.
let product_schema = from_json?;
let page = fetch?;
if let Some = &page.extracted
Field type values: text, attribute, html, inner_html, nested_list.
An empty selector ("") reads from the matched element itself — useful
inside nested_list to grab each item's own text or attribute. For
programmatic construction, see [ExtractSchema::builder()].
Error handling
use ;
match fetch
From async contexts
let page = spawn_blocking.await??;
Environment Variables
| Variable | Description |
|---|---|
SERVO_FETCH_USER_AGENT |
Default User-Agent string (overridden by .user_agent()) |
API Overview
| Function | Description |
|---|---|
markdown(url) |
Fetch → readable Markdown |
extract_json(url) |
Fetch → structured JSON |
text(url) |
Fetch → plain text (innerText) |
fetch(opts) |
Fetch with full options → Page |
crawl(opts) |
Crawl site → Vec<CrawlResult> |
crawl_each(opts, cb) |
Crawl site, streaming results |
map(opts) |
Discover URLs via sitemaps → Vec<MappedUrl> |
schema.extract_from(html) |
Apply a CSS-selector schema to HTML → serde_json::Value |
See docs.rs for the full API reference and examples/ for complete runnable programs.
License
MIT OR Apache-2.0