Skip to main content

Module parallel_backends

Module parallel_backends 

Source
Available on crate feature parallel_backends only.
Expand description

Parallel crawl backends — race alternative engines alongside the primary crawl.

Structs§

BackendBytesGuard
RAII guard that decrements [BACKEND_BYTES_IN_FLIGHT] on drop.
BackendResponse
The result of a backend page fetch, carrying quality metadata.
BackendResult
Wrapper returned by backend futures — always carries the backend index so that failures can be tracked for auto-disable.
BackendTracker
Tracks per-backend performance across a crawl session.
ProxyRotator
Round-robin proxy address selector for parallel backends.
ValidationResult
The result of a custom quality validation.

Functions§

backend_source_name
Return a human-readable backend source name for the given config entry.
build_backend_futures
Build alternative backend futures for a given URL from config.
fetch_cdpchrome
Fetch a page via a remote CDP endpoint (any CDP-speaking browser).
fetch_webdriverwebdriver
Fetch a page via a remote WebDriver endpoint (Servo, custom, or any WebDriver-speaking browser).
html_quality_score
Score an HTML response for quality (0–100). Higher is better.
html_quality_score_validated
Score an HTML response with both the built-in scorer and an optional custom validator. Returns the final clamped score (0–100).
is_binary_content_type
Returns true for Content-Type values where HTML quality racing is pointless — binary resources (images, fonts, video, archives, etc.) will be identical across all backends.
race_backends
Race the primary crawl against alternative backend futures.
resolve_protocol
Resolve the protocol for a backend endpoint. Falls back to engine defaults.
should_skip_backend_for_url
Returns true when the URL extension indicates a binary asset or matches a user-supplied skip extension. Backends should not be spawned for these.
tag_page_source
Set the backend_source field on a page (feature-gated).

Type Aliases§

QualityValidator
User-supplied quality validator. Called after the built-in scorer for every backend response. Receives the raw HTML bytes, status code, URL, and the backend source name (“primary”, “cdp”, “servo”, “custom”).