Available on crate feature
parallel_backends only.Expand description
Parallel crawl backends — race alternative engines alongside the primary crawl.
Structs§
- Backend
Bytes Guard - RAII guard that decrements [
BACKEND_BYTES_IN_FLIGHT] on drop. - Backend
Response - The result of a backend page fetch, carrying quality metadata.
- Backend
Result - Wrapper returned by backend futures — always carries the backend index so that failures can be tracked for auto-disable.
- Backend
Tracker - Tracks per-backend performance across a crawl session.
- Proxy
Rotator - Round-robin proxy address selector for parallel backends.
- Validation
Result - The result of a custom quality validation.
Functions§
- backend_
source_ name - Return a human-readable backend source name for the given config entry.
- build_
backend_ futures - Build alternative backend futures for a given URL from config.
- fetch_
cdp chrome - Fetch a page via a remote CDP endpoint (any CDP-speaking browser).
- fetch_
webdriver webdriver - Fetch a page via a remote WebDriver endpoint (Servo, custom, or any WebDriver-speaking browser).
- html_
quality_ score - Score an HTML response for quality (0–100). Higher is better.
- html_
quality_ score_ validated - Score an HTML response with both the built-in scorer and an optional custom validator. Returns the final clamped score (0–100).
- is_
binary_ content_ type - Returns
trueforContent-Typevalues where HTML quality racing is pointless — binary resources (images, fonts, video, archives, etc.) will be identical across all backends. - race_
backends - Race the primary crawl against alternative backend futures.
- resolve_
protocol - Resolve the protocol for a backend endpoint. Falls back to engine defaults.
- should_
skip_ backend_ for_ url - Returns
truewhen the URL extension indicates a binary asset or matches a user-supplied skip extension. Backends should not be spawned for these. - tag_
page_ source - Set the
backend_sourcefield on a page (feature-gated).
Type Aliases§
- Quality
Validator - User-supplied quality validator. Called after the built-in scorer for every backend response. Receives the raw HTML bytes, status code, URL, and the backend source name (“primary”, “cdp”, “servo”, “custom”).