Inspired by the Python library "BeautifulSoup," soup is a layer on top of
html5ever that aims to provide a slightly different API for querying &
manipulating HTML
Examples (inspired by bs4's docs)
Here is the HTML document we will be using for the rest of the examples:
const THREE_SISTERS: &'static str = r#"
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"#;
#
First let's try searching for a tag with a specific name:
# extern crate soup;
# const THREE_SISTERS: &'static str = r#"
# <html><head><title>The Dormouse's story</title></head>
# <body>
# <p class="title"><b>The Dormouse's story</b></p>
#
# <p class="story">Once upon a time there were three little sisters; and their names were
# <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
# <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
# <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
# and they lived at the bottom of a well.</p>
#
# <p class="story">...</p>
# "#;
#
So we see that .find will give us the first element that matches the query, and we've seen some
of the methods that we can call on the results. But what if we want to retrieve more than one
element with the query? For that, we'll use .find_all:
# extern crate soup;
# use *;
# const THREE_SISTERS: &'static str = r#"
# <html><head><title>The Dormouse's story</title></head>
# <body>
# <p class="title"><b>The Dormouse's story</b></p>
#
# <p class="story">Once upon a time there were three little sisters; and their names were
# <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
# <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
# <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
# and they lived at the bottom of a well.</p>
#
# <p class="story">...</p>
# "#;
#
Since .find_all returns an iterator, you can use it with all the methods you would
use with other iterators:
# extern crate soup;
# use *;
# const THREE_SISTERS: &'static str = r#"
# <html><head><title>The Dormouse's story</title></head>
# <body>
# <p class="title"><b>The Dormouse's story</b></p>
#
# <p class="story">Once upon a time there were three little sisters; and their names were
# <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
# <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
# <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
# and they lived at the bottom of a well.</p>
#
# <p class="story">...</p>
# "#;
#
The top-level structure we've been working with here, soup, implements the same methods
that the query results do, so you can call the same methods on it and it will delegate the
calls to the root node:
# extern crate soup;
# use *;
# const THREE_SISTERS: &'static str = r#"
# <html><head><title>The Dormouse's story</title></head>
# <body>
# <p class="title"><b>The Dormouse's story</b></p>
#
# <p class="story">Once upon a time there were three little sisters; and their names were
# <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
# <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
# <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
# and they lived at the bottom of a well.</p>
#
# <p class="story">...</p>
# "#;
#
You can use more than just strings to search for results, such as Regex:
# extern crate regex;
# extern crate soup;
# use *;
# use Error;
use Regex;
#
Passing true will match everything:
# extern crate soup;
# use *;
# use Error;
#
(also, passing false will always return no results, though if that is useful to you, please let me know)
So what can you do once you get the result of a query? Well, for one thing, you can traverse the tree a few different ways. You can ascend the tree:
# extern crate soup;
# use *;
# use Error;
#
Or you can descend it:
# extern crate soup;
# use *;
# use Error;
#
Or ascend it with an iterator:
# extern crate soup;
# use *;
# use Error;
#