Skip to main content

Crate scrapling

Crate scrapling 

Source
Expand description

§scrapling

A fast, adaptive web scraping toolkit for Rust — a feature-for-feature port of the Python scrapling library.

§Crate overview

This is the core crate. It provides:

  • TextHandler / TextHandlers — enriched string types with regex extraction, HTML entity decoding, whitespace cleaning, and JSON parsing. Every method that transforms a string returns a new TextHandler so the enriched type is preserved through chains of operations.

  • AttributesHandler — a read-only map of HTML element attributes whose values are TextHandlers, giving callers regex and cleaning methods directly on attribute values.

  • Error / Result — a structured error enum covering parsing, selector, encoding, regex, JSON, URL, and (optionally) storage failures.

  • utils — low-level text cleaning helpers (clean_spaces, clean_whitespace, flatten) used internally and available for downstream crates.

  • selector — HTML parsing, CSS selection with ::text/::attr() pseudo-elements, DOM navigation, and selector generation.

  • translator — CSS-to-XPath translation with pseudo-element support and LRU caching.

  • storage — persistent element storage trait with a SQLite backend for adaptive element relocation.

  • adaptive — structural similarity scoring and element relocation engine (12-factor scoring algorithm).

§Feature flags

FlagDefaultWhat it enables
storageyesSQLite-backed persistent element storage via rusqlite.

§Quick start

use scrapling::{TextHandler, TextHandlers, AttributesHandler};

// TextHandler wraps a String with extra powers
let price = TextHandler::new("Item costs $42.99 today");
let matches = price.re(r"\$(\d+\.\d+)", false, false, true).unwrap();
assert_eq!(matches[0].as_ref(), "42.99");

// AttributesHandler gives read-only access to element attributes
let attrs = AttributesHandler::new([
    ("class".to_owned(), "price-tag".to_owned()),
    ("data-currency".to_owned(), "USD".to_owned()),
]);
assert_eq!(attrs["class"].as_ref(), "price-tag");

Re-exports§

pub use attributes::AttributesHandler;
pub use error::Error;
pub use error::Result;
pub use selector::ParseOptions;
pub use text::TextHandler;
pub use text::TextHandlers;

Modules§

adaptive
Adaptive element relocation via structural similarity scoring.
attributes
Read-only HTML element attribute map.
error
Structured error types for the scrapling core crate.
selector
HTML element selection and DOM traversal.
shell
Shell and conversion utilities.
storage
Persistent element storage for adaptive selection.
text
Enriched string types for web scraping.
translator
CSS-to-XPath translation with ::text and ::attr() pseudo-element support.
utils
Low-level text cleaning and collection helpers.