servo-fetch
Fetch, render, and extract web content as Markdown, JSON, or screenshots with an embedded Servo browser engine. No Chromium, no containers, no external processes.
Looking for the CLI? See servo-fetch-cli.
Features
- Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
- Layout-aware extraction — strips navbars, sidebars, footers by rendered position
- Sync API — no async runtime required; wrap with
spawn_blockingfor async contexts - PDF auto-detection — URLs returning PDF are automatically extracted as text
- Typed errors —
Error::Timeout,Error::InvalidUrl, etc. for match-based retry logic - SSRF protection — blocks private IPs, reserved ranges, and metadata endpoints
Quick Start
let md = markdown?;
Examples
Fetch with options
use ;
use Duration;
let page = fetch?;
println!;
let md = page.markdown?;
Screenshot
use ;
let page = fetch?;
write?;
JavaScript execution
use ;
let page = fetch?;
println!;
Crawl a site
use ;
crawl_each?;
Error handling
use ;
match fetch
From async contexts
let page = spawn_blocking.await??;
Environment Variables
| Variable | Description |
|---|---|
SERVO_FETCH_USER_AGENT |
Default User-Agent string (overridden by .user_agent()) |
API Overview
| Function | Description |
|---|---|
markdown(url) |
Fetch → readable Markdown |
extract_json(url) |
Fetch → structured JSON |
text(url) |
Fetch → plain text (innerText) |
fetch(opts) |
Fetch with full options → Page |
crawl(opts) |
Crawl site → Vec<CrawlResult> |
crawl_each(opts, cb) |
Crawl site, streaming results |
map(opts) |
Discover URLs via sitemaps → Vec<MappedUrl> |
See docs.rs for the full API reference and examples/ for complete runnable programs.
License
MIT OR Apache-2.0