Expand description
HTTP and headless-browser rendering engine for the CRW web scraper.
Provides a FallbackRenderer that fetches pages via plain HTTP and optionally
re-renders them through a CDP-based headless browser when SPA content is detected.
http_only— Simple HTTP fetcher usingreqwestdetector— Heuristic SPA shell detection (empty body, framework markers)cdp— Chrome DevTools Protocol renderer (LightPanda, Playwright, Chrome) (requirescdpfeature)traits—PageFetchertrait for pluggable backends
§Feature flags
| Flag | Description |
|---|---|
cdp | Enables CDP WebSocket rendering via tokio-tungstenite |
§Example
use crw_core::config::RendererConfig;
use crw_renderer::FallbackRenderer;
use std::collections::HashMap;
use crw_core::config::StealthConfig;
let config = RendererConfig::default();
let stealth = StealthConfig::default();
let renderer = FallbackRenderer::new(&config, "crw/0.1", None, &stealth)?;
let deadline = crw_core::Deadline::from_request_ms(8000);
let result = renderer.fetch("https://example.com", &HashMap::new(), None, None, None, deadline).await?;
println!("status: {}", result.status_code);Modules§
- blocklist
- Resource blocklist for CDP
Fetch.requestPausedinterception. - breaker
- Sliding-window circuit breaker with multi-probe half-open and
linear-back-off cooldown. See
plans/breaker-cascade-fix.mdIter 3. - detector
- host_
limiter - Process-wide per-host rate-limiter and concurrency cap.
- http_
only - preference
- Per-host renderer preference learning.
- traits
Structs§
- Fallback
Renderer - Composite renderer that tries multiple backends in order.
- Screenshot
Req - Per-request screenshot capture parameters. Carried via a task-local rather
than the
PageFetcher::fetchsignature (mirrorsREQUEST_PROXY) so the trait + its ~30 call sites stay untouched.Some⇒ capture a PNG via CDPPage.captureScreenshotafter the wait window;None⇒ no screenshot.
Statics§
- REQUEST_
COUNTRY - Per-request country code (ISO 3166-1 alpha-2, lowercase) for the
chrome_proxy tier’s CDP auth pump. Set by
FallbackRenderer::fetchwhen aScrapeRequest.countryis present; read incdp.rswhile composing DataImpulse credentials. Task-local so child tasks spawned by the pool inherit it without trait-signature churn. - REQUEST_
PROXY - Resolved proxy entry for the current request, picked from the active
rotator by host. Set by the scrape/crawl entry points (via
FallbackRenderer::pick_proxy); read incdp.rsto drive the per-request ChromeproxyServer(a fresh proxied browser context) and theFetch.authRequiredpump.None= no proxy → existing behaviour. - REQUEST_
SCREENSHOT - Resolved screenshot request for the current scrape. Set by the
scrape/crawl entry point ([
crw_crawl::single::scrape_url]) whenformatscontainsScreenshot; read incdp.rsto drivePage.captureScreenshotand inFallbackRenderer::fetchto force the vanilla-Chrome CDP path.None= no screenshot → existing behaviour.
Functions§
- current_
screenshot_ req - The resolved screenshot params for the current task, if any.
- screenshot_
requested - Whether a screenshot was requested for the current task (reads the
REQUEST_SCREENSHOTtask-local).falsewhen unset / outside a scope.