pub struct BrowserConfig {Show 32 fields
pub max_pages: u32,
pub headless: bool,
pub disable_resources: bool,
pub network_idle: bool,
pub load_dom: bool,
pub wait_selector: Option<String>,
pub wait_selector_state: WaitState,
pub cookies: Vec<CookieParam>,
pub google_search: bool,
pub wait_ms: u64,
pub timezone_id: Option<String>,
pub proxy: Option<ProxyConfig>,
pub proxy_rotator: Option<ProxyRotator>,
pub extra_headers: HashMap<String, String>,
pub timeout_ms: f64,
pub init_script: Option<String>,
pub user_data_dir: Option<String>,
pub locale: Option<String>,
pub real_chrome: bool,
pub cdp_url: Option<String>,
pub useragent: Option<String>,
pub extra_flags: Vec<String>,
pub blocked_domains: HashSet<String>,
pub block_ads: bool,
pub retries: u32,
pub retry_delay_secs: f64,
pub capture_xhr: Option<String>,
pub executable_path: Option<String>,
pub dns_over_https: bool,
pub selector_config: HashMap<String, Value>,
pub page_setup: Option<PageCallback>,
pub page_action: Option<PageCallback>,
}Expand description
Browser session configuration – the central struct that controls how the Playwright browser is launched and how pages are navigated.
This mirrors the Python PlaywrightConfig from the original scrapling library.
Every field has a default value (see Default), so you only need to set the
fields relevant to your use case. Call validate
before passing the config to a session; sessions call it automatically during
construction.
Fields§
§max_pages: u32Maximum number of concurrent browser pages in the pool.
Must be between 1 and 50 inclusive. Higher values allow more parallel fetches
but consume more memory. Defaults to 1.
headless: boolWhether to launch the browser in headless mode.
Set to false when debugging to see the browser window. Defaults to true.
disable_resources: boolBlock heavyweight resource types (images, fonts, stylesheets) when true.
This significantly speeds up page loads when you only need the HTML/DOM.
The exact list of blocked types is defined in [constants::EXTRA_RESOURCES].
Defaults to false.
network_idle: boolWait for the network-idle event after navigation.
Useful for SPAs that fetch data after the initial document load, but slows
down fetches on pages with persistent connections (e.g. WebSocket heartbeats).
Defaults to false.
load_dom: boolWait for the DOMContentLoaded event after navigation.
This is faster than network_idle and sufficient for most server-rendered
pages. Defaults to true.
wait_selector: Option<String>Optional CSS selector to wait for before returning the page content. Use this when the data you need is rendered asynchronously by JavaScript and you know a specific element that signals the content is ready.
wait_selector_state: WaitStateRequired state of the wait selector before proceeding.
For example, WaitState::Visible waits until the element is both present
and visible on screen. Defaults to WaitState::Attached.
Cookies to inject into the browser context before navigation. Useful for authenticated scraping – set session cookies here to skip login flows.
google_search: boolPrepend a Google search navigation to warm the browser session.
Some bot-detection systems check the browser’s navigation history; visiting
Google first can make the session appear more natural. Defaults to true.
wait_ms: u64Extra delay in milliseconds to sleep after page load stabilisation.
Use this as a last resort when wait_selector and network_idle are not
enough. Defaults to 0 (no extra delay).
timezone_id: Option<String>IANA timezone identifier to emulate in the browser context (e.g. "America/New_York").
Setting this makes the browser’s Intl APIs and Date objects report the
chosen timezone, which can help avoid location-based bot detection.
proxy: Option<ProxyConfig>Static proxy server configuration.
Mutually exclusive with proxy_rotator – set one or the other, not both.
proxy_rotator: Option<ProxyRotator>Rotating proxy provider that supplies a fresh proxy per request.
Mutually exclusive with proxy – set one or the other, not both.
Useful when you need a different IP for each fetch to avoid rate limits.
extra_headers: HashMap<String, String>Additional HTTP headers sent with every request.
These are applied via Playwright’s set_extra_http_headers and will override
headers of the same name that the browser would normally send.
timeout_ms: f64Navigation and action timeout in milliseconds.
Applies to page.goto(), selector waits, and other timed operations.
Defaults to 30_000.0 (30 seconds).
init_script: Option<String>Path to a JavaScript file evaluated in every new page context.
The script runs before any page code, making it ideal for overriding
navigator properties or injecting polyfills. The file must exist on disk.
user_data_dir: Option<String>Path to a persistent user-data directory for the browser profile. When set, the browser stores cookies, local storage, and cache across sessions, which can help maintain login state between runs.
locale: Option<String>Locale string (e.g. "en-US") to emulate in the browser context.
Affects navigator.language, Accept-Language headers, and date/number
formatting in JavaScript.
real_chrome: boolLaunch with the system-installed Chrome instead of bundled Chromium.
The system Chrome may have a different fingerprint than Chromium and may
pass more bot-detection checks. Defaults to false.
cdp_url: Option<String>WebSocket URL for connecting to an existing Chrome DevTools Protocol endpoint.
Must start with ws:// or wss://. When set, the session attaches to a
running browser instead of launching a new one.
useragent: Option<String>Custom User-Agent string to set on the browser context.
When None, the browser uses its built-in default user agent.
extra_flags: Vec<String>Extra command-line flags passed to the browser process. These are appended after the default and stealth flags. Harmful automation-revealing flags are automatically filtered out.
blocked_domains: HashSet<String>Set of domain names whose requests will be blocked.
Blocking is suffix-based: adding "ads.example.com" also blocks
"sub.ads.example.com". See [intercept::is_domain_blocked] for details.
block_ads: boolMerge the built-in ad-domain blocklist into blocked_domains when true.
The blocklist contains roughly 3,500 known ad and tracker domains sourced
from Peter Lowe’s list. Defaults to false.
retries: u32Number of retry attempts for each fetch operation.
Must be between 1 and 10 inclusive. On failure, the session waits
retry_delay_secs between attempts. Defaults to 3.
retry_delay_secs: f64Delay in seconds between retry attempts.
Applies when a fetch fails and there are retries remaining.
Defaults to 1.0.
capture_xhr: Option<String>URL pattern to capture matching XHR/fetch responses. When set, the session intercepts network responses whose URL matches this pattern and includes them in the response. Useful for extracting API data that the page fetches via AJAX.
executable_path: Option<String>Path to a custom browser executable. Use this to point at a specific Chrome/Chromium binary instead of the one bundled with Playwright. The file must exist on disk.
dns_over_https: boolEnable DNS-over-HTTPS via Cloudflare’s resolver.
Adds the --dns-over-https-templates Chromium flag pointing at Cloudflare’s
1.1.1.1 DNS endpoint, encrypting DNS queries from the browser process.
Defaults to false.
selector_config: HashMap<String, Value>Arbitrary key-value configuration forwarded to the selector engine. This map is passed through to scrapling’s selector/parsing layer and can control how CSS selectors and smart matching behave.
page_setup: Option<PageCallback>Async callback invoked on each page immediately after creation. Use this to perform custom setup like adding request interceptors, injecting scripts, or configuring page-level settings before navigation begins.
page_action: Option<PageCallback>Async callback invoked on each page after navigation completes. Use this to perform post-navigation actions like clicking buttons, filling forms, or scrolling to trigger lazy-loaded content before the HTML is captured.
Implementations§
Source§impl BrowserConfig
impl BrowserConfig
Sourcepub fn validate(&mut self) -> Result<()>
pub fn validate(&mut self) -> Result<()>
Validate configuration invariants and populate derived fields.
This method checks that numeric fields are within acceptable ranges, that
mutually exclusive options are not both set, that file paths exist on disk,
and that the CDP URL (if any) uses a WebSocket scheme. When block_ads is
true, it also merges the built-in ad-domain list into blocked_domains.
You do not usually need to call this yourself – [DynamicSession::new] and
[StealthySession::new] call it automatically during construction.
Sourcepub fn has_proxy_rotator(&self) -> bool
pub fn has_proxy_rotator(&self) -> bool
Returns true if a rotating proxy provider is configured.
When a rotator is present the session creates a fresh browser context per
request so each navigation can use a different proxy address.