1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
//! # scrapling
//!
//! A fast, adaptive web scraping toolkit for Rust — a feature-for-feature port
//! of the Python [scrapling](https://github.com/camoufox/scrapling) library.
//!
//! ## Crate overview
//!
//! This is the **core** crate. It provides:
//!
//! - **[`TextHandler`]** / **[`TextHandlers`]** — enriched string types with
//! regex extraction, HTML entity decoding, whitespace cleaning, and JSON
//! parsing. Every method that transforms a string returns a new `TextHandler`
//! so the enriched type is preserved through chains of operations.
//!
//! - **[`AttributesHandler`]** — a read-only map of HTML element attributes
//! whose values are `TextHandler`s, giving callers regex and cleaning methods
//! directly on attribute values.
//!
//! - **[`Error`]** / **[`Result`]** — a structured error enum covering parsing,
//! selector, encoding, regex, JSON, URL, and (optionally) storage failures.
//!
//! - **[`utils`]** — low-level text cleaning helpers (`clean_spaces`,
//! `clean_whitespace`, `flatten`) used internally and available for
//! downstream crates.
//!
//! - **[`selector`]** — HTML parsing, CSS selection with `::text`/`::attr()`
//! pseudo-elements, DOM navigation, and selector generation.
//!
//! - **[`translator`]** — CSS-to-XPath translation with pseudo-element
//! support and LRU caching.
//!
//! - **[`storage`]** — persistent element storage trait with a SQLite backend
//! for adaptive element relocation.
//!
//! - **[`adaptive`]** — structural similarity scoring and element relocation
//! engine (12-factor scoring algorithm).
//!
//! ## Feature flags
//!
//! | Flag | Default | What it enables |
//! |------|---------|-----------------|
//! | `storage` | **yes** | SQLite-backed persistent element storage via `rusqlite`. |
//!
//! ## Quick start
//!
//! ```rust
//! use scrapling::{TextHandler, TextHandlers, AttributesHandler};
//!
//! // TextHandler wraps a String with extra powers
//! let price = TextHandler::new("Item costs $42.99 today");
//! let matches = price.re(r"\$(\d+\.\d+)", false, false, true).unwrap();
//! assert_eq!(matches[0].as_ref(), "42.99");
//!
//! // AttributesHandler gives read-only access to element attributes
//! let attrs = AttributesHandler::new([
//! ("class".to_owned(), "price-tag".to_owned()),
//! ("data-currency".to_owned(), "USD".to_owned()),
//! ]);
//! assert_eq!(attrs["class"].as_ref(), "price-tag");
//! ```
// Re-export primary types at crate root for ergonomic imports.
pub use AttributesHandler;
pub use ;
pub use ParseOptions;
pub use ;