scrapling-browser 0.1.0

Browser automation with anti-detection for scrapling
Documentation

Browser automation crate for the scrapling-rs web scraping framework.

This crate provides high-level browser automation built on top of Playwright, giving you two session types for fetching fully-rendered web pages:

  • [DynamicSession] -- a standard Playwright-driven browser that executes JavaScript, waits for network activity to settle, and returns the final DOM. Use this when the target site does not employ bot-detection.

  • [StealthySession] -- extends DynamicSession with anti-detection measures such as WebRTC leak prevention, canvas fingerprint noise, automation-flag removal, and an automatic Cloudflare Turnstile solver. Use this when sites actively block headless browsers.

Architecture overview

                 ┌──────────────┐
                 │  Your code   │
                 └──────┬───────┘
                        │ .fetch(url)
         ┌──────────────┴──────────────┐
         │  DynamicSession / StealthySession  │  (fetcher.rs)
         └──────────────┬──────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                  ▼
  engine.rs        intercept.rs      page_pool.rs
 (launch opts)   (request blocking)  (page tracking)
      │                 │
      ▼                 ▼
  constants.rs     ad_domains.rs
 (CLI flags)     (blocklist data)

Configuration starts with [BrowserConfig] (or [StealthConfig] for stealth sessions). Per-request overrides are expressed via [FetchParams], which are merged with the session-level config into [ResolvedFetchParams] before each navigation.

After navigation completes, the [response_factory] module extracts the page's HTML, status code, headers, and cookies into a unified [scrapling_fetch::Response] that the rest of the scrapling pipeline can parse and query.

Quick example

use scrapling_browser::{BrowserConfig, DynamicSession};

# async fn run() -> scrapling_browser::Result<()> {
let config = BrowserConfig {
    headless: true,
    disable_resources: true,
    ..Default::default()
};

let mut session = DynamicSession::new(config)?;
session.start().await?;

let response = session.fetch("https://example.com", None).await?;
println!("status: {}", response.status);

session.close().await?;
# Ok(())
# }