Skip to main content

SpiderPage

Struct SpiderPage 

Source
pub struct SpiderPage { /* private fields */ }
Expand description

Browser tab abstraction with full automation API.

Wraps a ProtocolAdapter and exposes high-level navigation, content extraction, click/input/scroll primitives, wait helpers, and viewport control. The adapter can be swapped atomically via [set_adapter] during browser rotation without dropping inflight references.

Implementations§

Source§

impl SpiderPage

Source

pub fn new(adapter: ProtocolAdapter) -> Self

Create a new SpiderPage wrapping the given protocol adapter.

Source

pub fn from_arc(adapter: Arc<ProtocolAdapter>) -> Self

Create a new SpiderPage from an already-Arc-wrapped adapter.

Source

pub async fn goto(&self, url: &str) -> Result<()>

Navigate to a URL and wait for load.

Source

pub async fn goto_fast(&self, url: &str) -> Result<()>

Navigate without waiting for full page load (5 s max wait). Use with [content_with_early_return] for SPAs that never fire loadEventFired.

Source

pub async fn goto_dom(&self, url: &str) -> Result<()>

Navigate and return as soon as DOMContentLoaded fires (3 s max). Fastest option – the DOM shell is ready but subresources may still load. Pair with [content_with_early_return] or [content_with_network_idle] for best results.

Source

pub async fn go_back(&self) -> Result<()>

Go back in browser history.

Source

pub async fn go_forward(&self) -> Result<()>

Go forward in browser history.

Source

pub async fn reload(&self) -> Result<()>

Reload the page.

Source

pub async fn content(&self, wait_ms: u64, min_length: usize) -> Result<String>

Get the full page HTML, ensuring the page is ready first.

Waits for network idle + DOM stability, then checks content quality. If the content seems incomplete (too short or looks like a loading state), does incremental waits with exponential backoff before returning.

  • wait_ms – Max time to wait for readiness (default 8000). Pass 0 to skip readiness checks and return immediately.
  • min_length – Minimum content length to consider “good” (default 1000).
Source

pub async fn raw_content(&self) -> Result<String>

Get the raw page HTML without any readiness waiting. Use this when you need immediate access or have already waited.

Source

pub async fn content_with_early_return( &self, max_wait_ms: u64, min_content_length: usize, poll_interval_ms: u64, ) -> Result<String>

Poll for content with early return – for SPAs that never fire loadEventFired.

Instead of waiting for a full page load event, this polls for HTML content at regular intervals and returns as soon as sufficient content is available. Useful for timeout retries where the page loads data asynchronously.

  • max_wait_ms – Max time to poll (default 15 s).
  • min_content_length – Minimum HTML length to accept (default 500).
  • poll_interval_ms – Interval between polls (default 2 s).
Source

pub async fn content_with_network_idle( &self, max_wait_ms: u64, min_content_length: usize, interstitial_budget_ms: u64, ) -> Result<String>

Get content using network idle detection + polling hybrid approach.

Best for heavy SPAs: uses PerformanceObserver + MutationObserver to detect when the page stops loading, combined with content-length thresholds.

Strategy:

  1. Wait for readyState=interactive (DOM parsed)
  2. Start network+DOM idle monitoring (400 ms silence threshold)
  3. Poll HTML length – return early if sufficient + idle
  4. Interstitial detection with configurable wait budget
  • max_wait_ms – Max total time to wait (default 20 s).
  • min_content_length – Minimum HTML length to accept (default 1000).
  • interstitial_budget_ms – Max time to wait for interstitials to resolve (default 16 s, use 30 s for retries).
Source

pub async fn title(&self) -> Result<String>

Get the page title.

Source

pub async fn url(&self) -> Result<String>

Get the current page URL.

Source

pub async fn screenshot(&self) -> Result<String>

Capture a screenshot as base64 PNG.

Source

pub async fn evaluate(&self, expression: &str) -> Result<Value>

Evaluate arbitrary JavaScript and return the result.

Source

pub async fn click(&self, selector: &str) -> Result<()>

Click an element by CSS selector.

Source

pub async fn click_at(&self, x: f64, y: f64) -> Result<()>

Click at specific viewport coordinates.

Source

pub async fn dblclick(&self, selector: &str) -> Result<()>

Double-click an element by CSS selector.

Source

pub async fn right_click(&self, selector: &str) -> Result<()>

Right-click an element by CSS selector.

Source

pub async fn click_and_hold(&self, selector: &str, hold_ms: u64) -> Result<()>

Click and hold an element for a duration.

Useful for long-press interactions, drag initiation, and mobile-style gestures.

  • selector – CSS selector of the element.
  • hold_ms – Duration in milliseconds to hold (default 1000).
Source

pub async fn click_and_hold_at( &self, x: f64, y: f64, hold_ms: u64, ) -> Result<()>

Click and hold at specific viewport coordinates for a duration.

  • x – X coordinate (CSS pixels).
  • y – Y coordinate (CSS pixels).
  • hold_ms – Duration in milliseconds to hold (default 1000).
Source

pub async fn click_all(&self, selector: &str) -> Result<()>

Click all elements matching a selector.

Source

pub async fn fill(&self, selector: &str, value: &str) -> Result<()>

Fill a form field – focus, clear existing value, type new value.

Source

pub async fn type_text(&self, value: &str) -> Result<()>

Type text into the currently focused element.

Source

pub async fn press(&self, key: &str) -> Result<()>

Press a named key (e.g. “Enter”, “Tab”, “Escape”).

Source

pub async fn clear(&self, selector: &str) -> Result<()>

Clear an input field.

Source

pub async fn select(&self, selector: &str, value: &str) -> Result<()>

Select an option in a <select> element.

Source

pub async fn focus(&self, selector: &str) -> Result<()>

Focus an element.

Source

pub async fn blur(&self, selector: &str) -> Result<()>

Blur (unfocus) an element.

Source

pub async fn hover(&self, selector: &str) -> Result<()>

Hover over an element.

Source

pub async fn drag(&self, from_selector: &str, to_selector: &str) -> Result<()>

Drag from one element to another.

Source

pub async fn scroll_y(&self, pixels: i64) -> Result<()>

Scroll vertically by pixels (positive = down).

Source

pub async fn scroll_x(&self, pixels: i64) -> Result<()>

Scroll horizontally by pixels (positive = right).

Source

pub async fn scroll_to(&self, selector: &str) -> Result<()>

Scroll an element into view.

Source

pub async fn scroll_to_point(&self, x: f64, y: f64) -> Result<()>

Scroll to absolute page coordinates.

Source

pub async fn wait_for_selector( &self, selector: &str, timeout_ms: u64, ) -> Result<()>

Wait for a CSS selector to appear in the DOM.

Source

pub async fn wait_for_navigation(&self, timeout_ms: u64) -> Result<()>

Wait for navigation/page load (simple delay).

Source

pub async fn wait_for_ready(&self, timeout_ms: u64) -> Result<()>

Wait until the page is fully loaded and DOM is stable.

Checks:

  1. document.readyState === 'complete'
  2. DOM content length stabilizes (no changes for 500 ms)

Use after goto() for SPAs and dynamic pages to ensure all content is rendered before extracting HTML.

Source

pub async fn wait_for_content( &self, min_length: usize, timeout_ms: u64, ) -> Result<()>

Wait until page content exceeds a minimum length. Useful for SPAs where content loads asynchronously.

Source

pub async fn wait_for_network_idle(&self, timeout_ms: u64) -> Result<()>

Wait for network idle + DOM stability (cross-platform).

Uses the Performance/Resource Timing API and MutationObserver (works in both Chrome/CDP and Firefox/BiDi) to detect when:

  1. document.readyState === 'complete'
  2. No new network resources loading (PerformanceObserver)
  3. DOM mutations have settled

This is more comprehensive than [wait_for_ready] – it also catches lazy-loaded images, XHR/fetch requests, and script-injected content.

Source

pub async fn set_viewport( &self, width: u32, height: u32, device_scale_factor: f64, mobile: bool, ) -> Result<()>

Set the viewport dimensions.

Source

pub async fn query_selector(&self, selector: &str) -> Result<Option<String>>

Query a single element and return its outer HTML.

Source

pub async fn query_selector_all(&self, selector: &str) -> Result<Vec<String>>

Query all matching elements and return their outer HTML.

Source

pub async fn text_content(&self, selector: &str) -> Result<Option<String>>

Get text content of an element.

Source

pub async fn extract_fields( &self, fields: &[(&str, FieldSelector<'_>)], ) -> Result<HashMap<String, Option<String>>>

Extract multiple fields from the page in a single evaluate call.

Each entry maps a key name to a FieldSelector. Returns a map of key → value (or None if the element was not found).

§Example
use std::collections::HashMap;
use spider_browser::page::FieldSelector;

let data = page.extract_fields(&[
    ("title", "#productTitle".into()),
    ("price", ".a-price .a-offscreen".into()),
    ("image", FieldSelector::Attr {
        selector: "#main-image",
        attribute: "src",
    }),
]).await?;
println!("{:?}", data.get("title"));
Source

pub fn route_message(&self, data: &str)

Route an incoming WebSocket message to the underlying protocol session.

Source

pub fn destroy(&self)

Clean up protocol resources.

Source

pub fn set_adapter(&self, adapter: ProtocolAdapter)

Replace the adapter (used during browser switching).

Atomically swaps the underlying ProtocolAdapter so that inflight operations on the old adapter can finish while new operations use the replacement.

Source

pub fn set_adapter_arc(&self, adapter: Arc<ProtocolAdapter>)

Replace the adapter with an already-Arc-wrapped instance.

Source

pub fn is_interstitial_content(html: &str) -> bool

Detect challenge interstitials that may auto-resolve (e.g. Cloudflare “Just a moment…”).

These pages show briefly before redirecting to the real content.

Source

pub fn is_rate_limit_content(html: &str) -> bool

Detect site-level rate limiting in page content.

Browser rotation gives a new profile which bypasses per-session rate limits.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more