Skyscraper - HTML scraping with XPath
Rust library to scrape HTML documents with XPath expressions.
HTML Parsing
Skyscraper has its own HTML parser implementation. The parser outputs a tree structure that can be traversed manually with parent/child relationships.
Example: Simple HTML Parsing
use ;
let html_text = r##"
<html>
<body>
<div>Hello world</div>
</body>
</html>"##;
let document = parse?;
Example: Traversing Parent/Child Relationships
// Parse the HTML text into a document
let text = r#"<parent><child/><child/></parent>"#;
let document = parse?;
// Get the children of the root node
let parent_node: DocumentNode = document.root_node;
let children: = parent_node.children.collect;
assert_eq!;
// Get the parent of both child nodes
let parent_of_child0: DocumentNode = children.parent.expect;
let parent_of_child1: DocumentNode = children.parent.expect;
assert_eq!;
assert_eq!;
XPath Expressions
Skyscraper is capable of parsing XPath strings and applying them to HTML documents.
use ;
// Parse the html text into a document.
let html_text = r##"
<div>
<div class="foo">
<span>yes</span>
</div>
<div class="bar">
<span>no</span>
</div>
</div>
"##;
let document = parse?;
// Parse and apply the xpath.
let expr = parse?;
let results = expr.apply?;
assert_eq!;
// Get text from the node
let text = results.get_text.expect;
assert_eq!;