Skip to main content

Crate scrapling_browser

Crate scrapling_browser 

Source
Expand description

Browser automation crate for the scrapling-rs web scraping framework.

This crate provides high-level browser automation built on top of Playwright, giving you two session types for fetching fully-rendered web pages:

  • DynamicSession – a standard Playwright-driven browser that executes JavaScript, waits for network activity to settle, and returns the final DOM. Use this when the target site does not employ bot-detection.

  • StealthySession – extends DynamicSession with anti-detection measures such as WebRTC leak prevention, canvas fingerprint noise, automation-flag removal, and an automatic Cloudflare Turnstile solver. Use this when sites actively block headless browsers.

§Architecture overview

                 ┌──────────────┐
                 │  Your code   │
                 └──────┬───────┘
                        │ .fetch(url)
         ┌──────────────┴──────────────┐
         │  DynamicSession / StealthySession  │  (fetcher.rs)
         └──────────────┬──────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                  ▼
  engine.rs        intercept.rs      page_pool.rs
 (launch opts)   (request blocking)  (page tracking)
      │                 │
      ▼                 ▼
  constants.rs     ad_domains.rs
 (CLI flags)     (blocklist data)

Configuration starts with BrowserConfig (or StealthConfig for stealth sessions). Per-request overrides are expressed via FetchParams, which are merged with the session-level config into ResolvedFetchParams before each navigation.

After navigation completes, the response_factory module extracts the page’s HTML, status code, headers, and cookies into a unified scrapling_fetch::Response that the rest of the scrapling pipeline can parse and query.

§Quick example

use scrapling_browser::{BrowserConfig, DynamicSession};

let config = BrowserConfig {
    headless: true,
    disable_resources: true,
    ..Default::default()
};

let mut session = DynamicSession::new(config)?;
session.start().await?;

let response = session.fetch("https://example.com", None).await?;
println!("status: {}", response.status);

session.close().await?;

Re-exports§

pub use config::BrowserConfig;
pub use config::CookieParam;
pub use config::FetchParams;
pub use config::ProxyConfig;
pub use config::ResolvedFetchParams;
pub use config::StealthConfig;
pub use config::WaitState;
pub use error::BrowserError;
pub use error::Result;
pub use fetcher::DynamicSession;
pub use fetcher::StealthySession;
pub use page_pool::PagePool;
pub use page_pool::PageState;
pub use page_pool::PoolStats;

Modules§

ad_domains
Built-in blocklist of known advertising and tracking domains.
config
Configuration types for browser automation sessions.
constants
Chromium launch arguments and resource-type constants.
engine
Low-level engine helpers for launching the Playwright driver and building Chromium launch options.
error
Error types for the scrapling-browser crate.
fetcher
High-level browser session types that drive page fetching.
intercept
Request interception logic for blocking unwanted network requests.
page_pool
Thread-safe browser page pool for tracking concurrent page usage.
response_factory
Factory for building scrapling_fetch::Response objects from Playwright pages.