Struct BrowserConfig

Source

pub struct BrowserConfig {Show 32 fields
    pub max_pages: u32,
    pub headless: bool,
    pub disable_resources: bool,
    pub network_idle: bool,
    pub load_dom: bool,
    pub wait_selector: Option<String>,
    pub wait_selector_state: WaitState,
    pub cookies: Vec<CookieParam>,
    pub google_search: bool,
    pub wait_ms: u64,
    pub timezone_id: Option<String>,
    pub proxy: Option<ProxyConfig>,
    pub proxy_rotator: Option<ProxyRotator>,
    pub extra_headers: HashMap<String, String>,
    pub timeout_ms: f64,
    pub init_script: Option<String>,
    pub user_data_dir: Option<String>,
    pub locale: Option<String>,
    pub real_chrome: bool,
    pub cdp_url: Option<String>,
    pub useragent: Option<String>,
    pub extra_flags: Vec<String>,
    pub blocked_domains: HashSet<String>,
    pub block_ads: bool,
    pub retries: u32,
    pub retry_delay_secs: f64,
    pub capture_xhr: Option<String>,
    pub executable_path: Option<String>,
    pub dns_over_https: bool,
    pub selector_config: HashMap<String, Value>,
    pub page_setup: Option<PageCallback>,
    pub page_action: Option<PageCallback>,
}

Expand description

Browser session configuration – the central struct that controls how the Playwright browser is launched and how pages are navigated.

This mirrors the Python PlaywrightConfig from the original scrapling library. Every field has a default value (see Default), so you only need to set the fields relevant to your use case. Call validate before passing the config to a session; sessions call it automatically during construction.

Fields§

§max_pages: u32

Maximum number of concurrent browser pages in the pool. Must be between 1 and 50 inclusive. Higher values allow more parallel fetches but consume more memory. Defaults to 1.

§headless: bool

Whether to launch the browser in headless mode. Set to false when debugging to see the browser window. Defaults to true.

§disable_resources: bool

Block heavyweight resource types (images, fonts, stylesheets) when true. This significantly speeds up page loads when you only need the HTML/DOM. The exact list of blocked types is defined in [constants::EXTRA_RESOURCES]. Defaults to false.

§network_idle: bool

Wait for the network-idle event after navigation. Useful for SPAs that fetch data after the initial document load, but slows down fetches on pages with persistent connections (e.g. WebSocket heartbeats). Defaults to false.

§load_dom: bool

Wait for the DOMContentLoaded event after navigation. This is faster than network_idle and sufficient for most server-rendered pages. Defaults to true.

§wait_selector: Option<String>

Optional CSS selector to wait for before returning the page content. Use this when the data you need is rendered asynchronously by JavaScript and you know a specific element that signals the content is ready.

§wait_selector_state: WaitState

Required state of the wait selector before proceeding. For example, WaitState::Visible waits until the element is both present and visible on screen. Defaults to WaitState::Attached.

§cookies: Vec<CookieParam>

Cookies to inject into the browser context before navigation. Useful for authenticated scraping – set session cookies here to skip login flows.

§google_search: bool

Prepend a Google search navigation to warm the browser session. Some bot-detection systems check the browser’s navigation history; visiting Google first can make the session appear more natural. Defaults to true.

§wait_ms: u64

Extra delay in milliseconds to sleep after page load stabilisation. Use this as a last resort when wait_selector and network_idle are not enough. Defaults to 0 (no extra delay).

§timezone_id: Option<String>

IANA timezone identifier to emulate in the browser context (e.g. "America/New_York"). Setting this makes the browser’s Intl APIs and Date objects report the chosen timezone, which can help avoid location-based bot detection.

§proxy: Option<ProxyConfig>

Static proxy server configuration. Mutually exclusive with proxy_rotator – set one or the other, not both.

§proxy_rotator: Option<ProxyRotator>

Rotating proxy provider that supplies a fresh proxy per request. Mutually exclusive with proxy – set one or the other, not both. Useful when you need a different IP for each fetch to avoid rate limits.

§extra_headers: HashMap<String, String>

Additional HTTP headers sent with every request. These are applied via Playwright’s set_extra_http_headers and will override headers of the same name that the browser would normally send.

§timeout_ms: f64

Navigation and action timeout in milliseconds. Applies to page.goto(), selector waits, and other timed operations. Defaults to 30_000.0 (30 seconds).

§init_script: Option<String>

Path to a JavaScript file evaluated in every new page context. The script runs before any page code, making it ideal for overriding navigator properties or injecting polyfills. The file must exist on disk.

§user_data_dir: Option<String>

Path to a persistent user-data directory for the browser profile. When set, the browser stores cookies, local storage, and cache across sessions, which can help maintain login state between runs.

§locale: Option<String>

Locale string (e.g. "en-US") to emulate in the browser context. Affects navigator.language, Accept-Language headers, and date/number formatting in JavaScript.

§real_chrome: bool

Launch with the system-installed Chrome instead of bundled Chromium. The system Chrome may have a different fingerprint than Chromium and may pass more bot-detection checks. Defaults to false.

§cdp_url: Option<String>

WebSocket URL for connecting to an existing Chrome DevTools Protocol endpoint. Must start with ws:// or wss://. When set, the session attaches to a running browser instead of launching a new one.

§useragent: Option<String>

Custom User-Agent string to set on the browser context. When None, the browser uses its built-in default user agent.

§extra_flags: Vec<String>

Extra command-line flags passed to the browser process. These are appended after the default and stealth flags. Harmful automation-revealing flags are automatically filtered out.

§blocked_domains: HashSet<String>

Set of domain names whose requests will be blocked. Blocking is suffix-based: adding "ads.example.com" also blocks "sub.ads.example.com". See [intercept::is_domain_blocked] for details.

§block_ads: bool

Merge the built-in ad-domain blocklist into blocked_domains when true. The blocklist contains roughly 3,500 known ad and tracker domains sourced from Peter Lowe’s list. Defaults to false.

§retries: u32

Number of retry attempts for each fetch operation. Must be between 1 and 10 inclusive. On failure, the session waits retry_delay_secs between attempts. Defaults to 3.

§retry_delay_secs: f64

Delay in seconds between retry attempts. Applies when a fetch fails and there are retries remaining. Defaults to 1.0.

§capture_xhr: Option<String>

URL pattern to capture matching XHR/fetch responses. When set, the session intercepts network responses whose URL matches this pattern and includes them in the response. Useful for extracting API data that the page fetches via AJAX.

§executable_path: Option<String>

Path to a custom browser executable. Use this to point at a specific Chrome/Chromium binary instead of the one bundled with Playwright. The file must exist on disk.

§dns_over_https: bool

Enable DNS-over-HTTPS via Cloudflare’s resolver. Adds the --dns-over-https-templates Chromium flag pointing at Cloudflare’s 1.1.1.1 DNS endpoint, encrypting DNS queries from the browser process. Defaults to false.

§selector_config: HashMap<String, Value>

Arbitrary key-value configuration forwarded to the selector engine. This map is passed through to scrapling’s selector/parsing layer and can control how CSS selectors and smart matching behave.

§page_setup: Option<PageCallback>

Async callback invoked on each page immediately after creation. Use this to perform custom setup like adding request interceptors, injecting scripts, or configuring page-level settings before navigation begins.

§page_action: Option<PageCallback>

Async callback invoked on each page after navigation completes. Use this to perform post-navigation actions like clicking buttons, filling forms, or scrolling to trigger lazy-loaded content before the HTML is captured.

Struct BrowserConfig Copy item path

Fields§

Implementations§

impl BrowserConfig

pub fn validate(&mut self) -> Result<()>

pub fn has_proxy_rotator(&self) -> bool

pub fn is_cdp(&self) -> bool

Trait Implementations§

impl Debug for BrowserConfig

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for BrowserConfig

fn default() -> Self

Auto Trait Implementations§

impl !Freeze for BrowserConfig

impl !RefUnwindSafe for BrowserConfig

impl Send for BrowserConfig

impl Sync for BrowserConfig

impl Unpin for BrowserConfig

impl UnsafeUnpin for BrowserConfig

impl !UnwindSafe for BrowserConfig

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> Same for T

type Output = T

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

Struct BrowserConfig

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,