crw-renderer
HTTP and headless-browser rendering engine for the CRW web scraper.
Overview
crw-renderer fetches web pages via plain HTTP and optionally re-renders them through a CDP-based headless browser when SPA content is detected.
FallbackRenderer— Composite renderer: tries HTTP first, falls back to JS rendering when the page looks like a SPA shellHttpFetcher— Fast reqwest-based HTTP fetcher with stealth headers, gzip/brotli decompression, and proxy support- SPA detection — Heuristic analysis of the HTML response (empty body, framework markers like
__NEXT_DATA__,ng-app,nuxt) to auto-detect pages that need JS rendering - CDP rendering — Chrome DevTools Protocol support for LightPanda, Playwright, and Chrome (requires
cdpfeature) - Stealth mode — User-Agent rotation from a built-in Chrome/Firefox/Safari pool and browser-like header injection
Installation
With CDP (headless browser) support:
Feature flags
| Flag | Default | Description |
|---|---|---|
cdp |
off | Enables CDP WebSocket rendering via tokio-tungstenite (LightPanda, Playwright, Chrome) |
Usage
Basic HTTP fetching
use ;
use FallbackRenderer;
use HashMap;
async
Smart mode (auto-detect SPAs)
When a JS renderer is configured, FallbackRenderer automatically detects SPA shells and re-renders with a headless browser:
use ;
use FallbackRenderer;
use HashMap;
let config = RendererConfig ;
let stealth = default;
let renderer = new;
// Auto mode: HTTP first, JS rendering if SPA detected
let result = renderer.fetch.await?;
SPA detection
Use the detector directly to check if HTML needs JS rendering:
use needs_js_rendering;
let spa_html = r#"<html><body><div id="root"></div><script src="/app.js"></script></body></html>"#;
assert!;
let static_html = r#"<html><body><h1>Hello</h1><p>This is a static page with content.</p></body></html>"#;
assert!;
Stealth mode
Enable User-Agent rotation and browser-like headers to reduce bot detection:
use ;
use FallbackRenderer;
let stealth = StealthConfig ;
let renderer = new;
Health check
let health = renderer.check_health.await;
for in &health
Part of CRW
This crate is part of the CRW workspace — a fast, lightweight, Firecrawl-compatible web scraper built in Rust.
| Crate | Description |
|---|---|
| crw-core | Core types, config, and error handling |
| crw-renderer | HTTP + CDP browser rendering engine (this crate) |
| crw-extract | HTML → markdown/plaintext extraction |
| crw-crawl | Async BFS crawler with robots.txt & sitemap |
| crw-server | Firecrawl-compatible API server |
| crw-cli | Standalone CLI (crw binary) |
| crw-mcp | MCP stdio proxy binary |
License
AGPL-3.0 — see LICENSE.