scrapling
A fast, adaptive web scraping toolkit for Rust — a feature-for-feature port of the Python scrapling library.
Crate overview
This is the core crate. It provides:
-
[
TextHandler] / [TextHandlers] — enriched string types with regex extraction, HTML entity decoding, whitespace cleaning, and JSON parsing. Every method that transforms a string returns a newTextHandlerso the enriched type is preserved through chains of operations. -
[
AttributesHandler] — a read-only map of HTML element attributes whose values areTextHandlers, giving callers regex and cleaning methods directly on attribute values. -
[
Error] / [Result] — a structured error enum covering parsing, selector, encoding, regex, JSON, URL, and (optionally) storage failures. -
[
utils] — low-level text cleaning helpers (clean_spaces,clean_whitespace,flatten) used internally and available for downstream crates. -
[
selector] — HTML parsing, CSS selection with::text/::attr()pseudo-elements, DOM navigation, and selector generation. -
[
translator] — CSS-to-XPath translation with pseudo-element support and LRU caching. -
[
storage] — persistent element storage trait with a SQLite backend for adaptive element relocation. -
[
adaptive] — structural similarity scoring and element relocation engine (12-factor scoring algorithm).
Feature flags
| Flag | Default | What it enables |
|---|---|---|
storage |
yes | SQLite-backed persistent element storage via rusqlite. |
Quick start
use ;
// TextHandler wraps a String with extra powers
let price = new;
let matches = price.re.unwrap;
assert_eq!;
// AttributesHandler gives read-only access to element attributes
let attrs = new;
assert_eq!;