servo-fetch
Fetch, render, and extract web content as Markdown, JSON, or screenshots with an embedded Servo browser engine. No Chromium, no containers, no external processes.
Looking for the CLI? See servo-fetch-cli.
Features
- Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
- Layout- and visibility-aware extraction — strips navbars/footers by rendered position, plus cookie banners, modals, and CSS-hidden content
- Schema-driven JSON — declarative CSS-selector schema pulls structured data, no LLM
- Async-first API — top-level functions are
async; sync mirror in [blocking] submodule - PDF auto-detection — URLs returning PDF are automatically extracted as text
- Typed errors —
Error::Timeout,Error::InvalidUrl, etc. for match-based retry logic - SSRF protection — blocks private IPs, reserved ranges, and metadata endpoints
Quick Start
let md = markdown.await?;
For synchronous code, use the [blocking] submodule:
let md = markdown?;
Examples
Fetch with options
use ;
use Duration;
let page = fetch.await?;
println!;
let md = page.markdown?;
Screenshot
use ;
let page = fetch.await?;
write?;
JavaScript execution
use ;
let page = fetch.await?;
println!;
Crawl a site
use ;
crawl_each.await?;
Schema-driven JSON extraction
use ;
use ExtractSchema;
// Load a schema from a file...
let product_schema = from_path?;
// ...or from an inline string.
let product_schema = from_json?;
let page = fetch.await?;
if let Some = &page.extracted
Field type values: text, attribute, html, inner_html, nested_list.
An empty selector ("") reads from the matched element itself — useful
inside nested_list to grab each item's own text or attribute. For
programmatic construction, see [ExtractSchema::builder()].
Error handling
use ;
match fetch.await
Client
use Client;
use Duration;
let client = builder
.timeout
.user_agent
.build;
let p1 = client.fetch.await?;
let p2 = client.fetch.await?;
Sync mode
use blocking;
let md = markdown?;
let client = builder.timeout.build;
let page = client.fetch?;
Environment Variables
| Variable | Description |
|---|---|
SERVO_FETCH_USER_AGENT |
Default User-Agent string (overridden by .user_agent()) |
API Overview
Every function is available in both async (top-level) and sync (blocking::*) form.
| Function | Returns |
|---|---|
markdown(url) |
Readable Markdown |
text(url) |
Plain text (innerText) |
extract_json(url) |
Structured JSON |
fetch(opts) |
Page |
crawl(opts) |
Vec<CrawlResult> |
crawl_each(opts, cb) |
Streaming results via callback |
map(opts) |
Vec<MappedUrl> (URL discovery) |
Client / ClientBuilder |
Reusable client with defaults |
See docs.rs for the full API reference and examples/ for complete runnable programs.
License
MIT OR Apache-2.0