Skip to main content

BrowserConfig

Struct BrowserConfig 

Source
pub struct BrowserConfig {
Show 32 fields pub max_pages: u32, pub headless: bool, pub disable_resources: bool, pub network_idle: bool, pub load_dom: bool, pub wait_selector: Option<String>, pub wait_selector_state: WaitState, pub cookies: Vec<CookieParam>, pub google_search: bool, pub wait_ms: u64, pub timezone_id: Option<String>, pub proxy: Option<ProxyConfig>, pub proxy_rotator: Option<ProxyRotator>, pub extra_headers: HashMap<String, String>, pub timeout_ms: f64, pub init_script: Option<String>, pub user_data_dir: Option<String>, pub locale: Option<String>, pub real_chrome: bool, pub cdp_url: Option<String>, pub useragent: Option<String>, pub extra_flags: Vec<String>, pub blocked_domains: HashSet<String>, pub block_ads: bool, pub retries: u32, pub retry_delay_secs: f64, pub capture_xhr: Option<String>, pub executable_path: Option<String>, pub dns_over_https: bool, pub selector_config: HashMap<String, Value>, pub page_setup: Option<PageCallback>, pub page_action: Option<PageCallback>,
}
Expand description

Browser session configuration – the central struct that controls how the Playwright browser is launched and how pages are navigated.

This mirrors the Python PlaywrightConfig from the original scrapling library. Every field has a default value (see Default), so you only need to set the fields relevant to your use case. Call validate before passing the config to a session; sessions call it automatically during construction.

Fields§

§max_pages: u32

Maximum number of concurrent browser pages in the pool. Must be between 1 and 50 inclusive. Higher values allow more parallel fetches but consume more memory. Defaults to 1.

§headless: bool

Whether to launch the browser in headless mode. Set to false when debugging to see the browser window. Defaults to true.

§disable_resources: bool

Block heavyweight resource types (images, fonts, stylesheets) when true. This significantly speeds up page loads when you only need the HTML/DOM. The exact list of blocked types is defined in [constants::EXTRA_RESOURCES]. Defaults to false.

§network_idle: bool

Wait for the network-idle event after navigation. Useful for SPAs that fetch data after the initial document load, but slows down fetches on pages with persistent connections (e.g. WebSocket heartbeats). Defaults to false.

§load_dom: bool

Wait for the DOMContentLoaded event after navigation. This is faster than network_idle and sufficient for most server-rendered pages. Defaults to true.

§wait_selector: Option<String>

Optional CSS selector to wait for before returning the page content. Use this when the data you need is rendered asynchronously by JavaScript and you know a specific element that signals the content is ready.

§wait_selector_state: WaitState

Required state of the wait selector before proceeding. For example, WaitState::Visible waits until the element is both present and visible on screen. Defaults to WaitState::Attached.

§cookies: Vec<CookieParam>

Cookies to inject into the browser context before navigation. Useful for authenticated scraping – set session cookies here to skip login flows.

§google_search: bool

Prepend a Google search navigation to warm the browser session. Some bot-detection systems check the browser’s navigation history; visiting Google first can make the session appear more natural. Defaults to true.

§wait_ms: u64

Extra delay in milliseconds to sleep after page load stabilisation. Use this as a last resort when wait_selector and network_idle are not enough. Defaults to 0 (no extra delay).

§timezone_id: Option<String>

IANA timezone identifier to emulate in the browser context (e.g. "America/New_York"). Setting this makes the browser’s Intl APIs and Date objects report the chosen timezone, which can help avoid location-based bot detection.

§proxy: Option<ProxyConfig>

Static proxy server configuration. Mutually exclusive with proxy_rotator – set one or the other, not both.

§proxy_rotator: Option<ProxyRotator>

Rotating proxy provider that supplies a fresh proxy per request. Mutually exclusive with proxy – set one or the other, not both. Useful when you need a different IP for each fetch to avoid rate limits.

§extra_headers: HashMap<String, String>

Additional HTTP headers sent with every request. These are applied via Playwright’s set_extra_http_headers and will override headers of the same name that the browser would normally send.

§timeout_ms: f64

Navigation and action timeout in milliseconds. Applies to page.goto(), selector waits, and other timed operations. Defaults to 30_000.0 (30 seconds).

§init_script: Option<String>

Path to a JavaScript file evaluated in every new page context. The script runs before any page code, making it ideal for overriding navigator properties or injecting polyfills. The file must exist on disk.

§user_data_dir: Option<String>

Path to a persistent user-data directory for the browser profile. When set, the browser stores cookies, local storage, and cache across sessions, which can help maintain login state between runs.

§locale: Option<String>

Locale string (e.g. "en-US") to emulate in the browser context. Affects navigator.language, Accept-Language headers, and date/number formatting in JavaScript.

§real_chrome: bool

Launch with the system-installed Chrome instead of bundled Chromium. The system Chrome may have a different fingerprint than Chromium and may pass more bot-detection checks. Defaults to false.

§cdp_url: Option<String>

WebSocket URL for connecting to an existing Chrome DevTools Protocol endpoint. Must start with ws:// or wss://. When set, the session attaches to a running browser instead of launching a new one.

§useragent: Option<String>

Custom User-Agent string to set on the browser context. When None, the browser uses its built-in default user agent.

§extra_flags: Vec<String>

Extra command-line flags passed to the browser process. These are appended after the default and stealth flags. Harmful automation-revealing flags are automatically filtered out.

§blocked_domains: HashSet<String>

Set of domain names whose requests will be blocked. Blocking is suffix-based: adding "ads.example.com" also blocks "sub.ads.example.com". See [intercept::is_domain_blocked] for details.

§block_ads: bool

Merge the built-in ad-domain blocklist into blocked_domains when true. The blocklist contains roughly 3,500 known ad and tracker domains sourced from Peter Lowe’s list. Defaults to false.

§retries: u32

Number of retry attempts for each fetch operation. Must be between 1 and 10 inclusive. On failure, the session waits retry_delay_secs between attempts. Defaults to 3.

§retry_delay_secs: f64

Delay in seconds between retry attempts. Applies when a fetch fails and there are retries remaining. Defaults to 1.0.

§capture_xhr: Option<String>

URL pattern to capture matching XHR/fetch responses. When set, the session intercepts network responses whose URL matches this pattern and includes them in the response. Useful for extracting API data that the page fetches via AJAX.

§executable_path: Option<String>

Path to a custom browser executable. Use this to point at a specific Chrome/Chromium binary instead of the one bundled with Playwright. The file must exist on disk.

§dns_over_https: bool

Enable DNS-over-HTTPS via Cloudflare’s resolver. Adds the --dns-over-https-templates Chromium flag pointing at Cloudflare’s 1.1.1.1 DNS endpoint, encrypting DNS queries from the browser process. Defaults to false.

§selector_config: HashMap<String, Value>

Arbitrary key-value configuration forwarded to the selector engine. This map is passed through to scrapling’s selector/parsing layer and can control how CSS selectors and smart matching behave.

§page_setup: Option<PageCallback>

Async callback invoked on each page immediately after creation. Use this to perform custom setup like adding request interceptors, injecting scripts, or configuring page-level settings before navigation begins.

§page_action: Option<PageCallback>

Async callback invoked on each page after navigation completes. Use this to perform post-navigation actions like clicking buttons, filling forms, or scrolling to trigger lazy-loaded content before the HTML is captured.

Implementations§

Source§

impl BrowserConfig

Source

pub fn validate(&mut self) -> Result<()>

Validate configuration invariants and populate derived fields.

This method checks that numeric fields are within acceptable ranges, that mutually exclusive options are not both set, that file paths exist on disk, and that the CDP URL (if any) uses a WebSocket scheme. When block_ads is true, it also merges the built-in ad-domain list into blocked_domains.

You do not usually need to call this yourself – [DynamicSession::new] and [StealthySession::new] call it automatically during construction.

Source

pub fn has_proxy_rotator(&self) -> bool

Returns true if a rotating proxy provider is configured. When a rotator is present the session creates a fresh browser context per request so each navigation can use a different proxy address.

Source

pub fn is_cdp(&self) -> bool

Returns true if the session will connect via Chrome DevTools Protocol. CDP mode attaches to a running browser rather than launching a new process, which is useful for connecting to remote or containerised browsers.

Trait Implementations§

Source§

impl Debug for BrowserConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for BrowserConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more