Expand description
Browser automation crate for the scrapling-rs web scraping framework.
This crate provides high-level browser automation built on top of Playwright, giving you two session types for fetching fully-rendered web pages:
-
DynamicSession– a standard Playwright-driven browser that executes JavaScript, waits for network activity to settle, and returns the final DOM. Use this when the target site does not employ bot-detection. -
StealthySession– extendsDynamicSessionwith anti-detection measures such as WebRTC leak prevention, canvas fingerprint noise, automation-flag removal, and an automatic Cloudflare Turnstile solver. Use this when sites actively block headless browsers.
§Architecture overview
┌──────────────┐
│ Your code │
└──────┬───────┘
│ .fetch(url)
┌──────────────┴──────────────┐
│ DynamicSession / StealthySession │ (fetcher.rs)
└──────────────┬──────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
engine.rs intercept.rs page_pool.rs
(launch opts) (request blocking) (page tracking)
│ │
▼ ▼
constants.rs ad_domains.rs
(CLI flags) (blocklist data)Configuration starts with BrowserConfig (or StealthConfig for stealth sessions).
Per-request overrides are expressed via FetchParams, which are merged with the
session-level config into ResolvedFetchParams before each navigation.
After navigation completes, the response_factory module extracts the page’s HTML,
status code, headers, and cookies into a unified scrapling_fetch::Response that the
rest of the scrapling pipeline can parse and query.
§Quick example
use scrapling_browser::{BrowserConfig, DynamicSession};
let config = BrowserConfig {
headless: true,
disable_resources: true,
..Default::default()
};
let mut session = DynamicSession::new(config)?;
session.start().await?;
let response = session.fetch("https://example.com", None).await?;
println!("status: {}", response.status);
session.close().await?;Re-exports§
pub use config::BrowserConfig;pub use config::CookieParam;pub use config::FetchParams;pub use config::ProxyConfig;pub use config::ResolvedFetchParams;pub use config::StealthConfig;pub use config::WaitState;pub use error::BrowserError;pub use error::Result;pub use fetcher::DynamicSession;pub use fetcher::StealthySession;pub use page_pool::PagePool;pub use page_pool::PageState;pub use page_pool::PoolStats;
Modules§
- ad_
domains - Built-in blocklist of known advertising and tracking domains.
- config
- Configuration types for browser automation sessions.
- constants
- Chromium launch arguments and resource-type constants.
- engine
- Low-level engine helpers for launching the Playwright driver and building Chromium launch options.
- error
- Error types for the
scrapling-browsercrate. - fetcher
- High-level browser session types that drive page fetching.
- intercept
- Request interception logic for blocking unwanted network requests.
- page_
pool - Thread-safe browser page pool for tracking concurrent page usage.
- response_
factory - Factory for building
scrapling_fetch::Responseobjects from Playwright pages.