1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
/*!
TagSoup is a small, fast, fairly forgiving HTML-ish parser with zero required dependencies.
It is built for the boringly useful jobs:
- Parse real-world markup without immediately fainting.
- Walk the resulting tree.
- Query it with a compact CSS-style selector API.
- Pull out text, attributes, and spans.
It is not trying to impersonate a browser engine. It just wants to turn messy markup into something workable, quickly.
# Highlights
- Optional `serde` support, enabled by default.
- Preserves source spans for nodes and parse errors.
- Handles raw-text elements like `script` and `style` sensibly.
- Supports `query_selector` and `query_selector_all`.
- Supports tree walking with a small visitor API.
- Tries to recover from malformed markup instead of giving up immediately.
# Examples
```
// Parse an HTML tag soup.
let doc = tagsoup::Document::parse("<div><p id=here>Hello, world!</p></div>");
// Check for parsing errors.
assert!(doc.errors.is_empty());
// Query the document for an element using a CSS selector.
let element = doc.query_selector("#here").unwrap();
assert_eq!(element.text_content(), "Hello, world!");
```
# Querying The Tree
```
let doc = tagsoup::Document::parse(r#"
<article id="main">
<p class="lead">Hello</p>
<p data-kind="feature card">world</p>
</article>
"#);
assert_eq!(doc.query_selector("#main .lead").unwrap().text_content(), "Hello");
assert_eq!(doc.query_selector_all("[data-kind*=feature]").len(), 1);
```
# Notes
- Whitespace is preserved by default.
- Call [`Document::trimmed`] if you want leading and trailing ASCII whitespace removed from text nodes.
- [`Element::text_content`] decodes HTML entities, except inside raw-text elements like `script` and `style`.
- Invalid selectors currently panic in [`Document::query_selector`] and [`Document::query_selector_all`].
This is not a full WHATWG-compliant HTML parser. It is a pragmatic parser for documents that are mostly HTML, occasionally cursed, and still need to be dealt with.
*/
use HashMap;
use Cow;
use ;
// #[macro_use]
// mod known;
// mod attribute;
// mod tag;
// pub use attribute::*;
// pub use tag::*;
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;