Expand description
§Quickstart
This introduction is heavily based that of python’s beautifulsoup
We’ll be using the following HTML fragment as an example throughout:
let html = r#"
<head>
<title>The Dormouse's story</title>
</head>
<body>
<p class="title"><b>The Dormouse's story </b></p>
<p class="intro">
Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie </a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie </a>and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>; and they lived at the bottom of a well.
</p>
<p class="story">...</p>
</body>
"#;
The HTML will be parsed into an arena allocator, which we will need to create next. The lifetimes of all the parsed nodes & elements will be tied to this object. Once it goes out of scope, everything is cleaned up.
use htmlite::{NodeArena, Html};
let arena = NodeArena::new();
Next, we’ll parse the HTML string.
You’ll get back a fragment node which contains the top-level <head>
and <body>
elements.
It also contains 3 whitespace text nodes; before <head>
, between <head>
and <body>
, and after <body>
.
let Html { root, .. } = htmlite::parse(&arena, html).unwrap();
let child_elements = root
.children()
.filter_map(htmlite::Node::as_element)
.collect::<Vec<_>>();
let child_text = root
.children()
.filter_map(htmlite::Node::as_text)
.collect::<Vec<_>>();
assert_eq!(child_elements.len(), 2);
assert_eq!(child_text.len(), 3);
assert_eq!(child_elements[0].name, "head");
assert_eq!(child_elements[1].name, "body");
Here are some ways to navigate around:
let title = root.select("title").next().unwrap();
assert_eq!(title.outer_html(), "<title>The Dormouse's story</title>");
assert_eq!(title.name, "title");
assert_eq!(title.text_content().collect::<String>(), "The Dormouse's story");
assert_eq!(title.parent().and_then(htmlite::Node::as_element).unwrap().name, "head");
let first_p = root.select("p").next().unwrap();
assert_eq!(first_p.outer_html(), r#"<p class="title"><b>The Dormouse's story</b></p>"#);
assert_eq!(&first_p["class"], "title");
One common task is extracting all the URLs found within a page’s <a>
tags:
let mut links = Vec::new();
for anchor in root.select("a[href]") {
links.push(&anchor["href"]);
}
assert_eq!(
links,
vec![
"http://example.com/elsie",
"http://example.com/lacie",
"http://example.com/tillie",
]
);
Individual nodes within the tree are immutable, but you can create new nodes and detach others, allowing you to manipulate the tree structure.
let new_story = arena.fragment([
arena.p([("class", "story")], arena.text("`What did they live on?' said Alice, who always took a great interest in questions of eating and drinking.")),
arena.footer(None, arena.text("The end"))
]);
let old_story = root.select(".story").next().unwrap();
old_story.insert_after(new_story);
old_story.detach();
assert_eq!(
root.text_content().collect::<Vec<_>>(),
[
"\n",
"\n ",
"The Dormouse's story",
"\n",
"\n",
"\n ",
"The Dormouse's story",
"\n ",
"\n Once upon a time there were three little sisters; and their names were\n ",
"Elsie ",
",\n ",
"Lacie ",
"and\n ",
"Tillie",
"; and they lived at the bottom of a well.\n ",
"\n ",
"`What did they live on?' said Alice, who always took a great interest in questions of eating and drinking.",
"The end",
"\n",
"\n",
],
);
Modules§
Structs§
- Ancestors
- See
Node::ancestors
- Comment
- A comment node
- Descendants
- See
Node::descendants
- Doctype
- A doctype node
- Element
- An HTML element like
<p>
or<div>
. - Following
- See
Node::following
- Html
- An HTML tree.
- Node
- Represents a node in an HTML document.
- Node
Arena - Storage for parsed HTML nodes.
- Preceding
- See
Node::preceding
- Selected
- See
Node::select
- Text
- A text node
- Tree
Construction Error - An error that occurs while constructing an HTML tree.
Enums§
- Node
Kind - Types of nodes that might exist in an HTML document.
Functions§
- parse
- Parses the given HTML fragment.