Expand description
A declarative HTML parser library in Rust, which works like a deserializer from HTML to struct.
§Example
use h2s::FromHtml;
#[derive(FromHtml, Debug, Eq, PartialEq)]
pub struct Page {
#[h2s(attr = "lang")]
lang: String,
#[h2s(select = "div > h1.blog-title")]
blog_title: String,
#[h2s(select = ".articles > div")]
articles: Vec<Article>,
}
#[derive(FromHtml, Debug, Eq, PartialEq)]
pub struct Article {
#[h2s(select = "h2 > a")]
title: String,
#[h2s(select = "div > span")]
view_count: usize,
#[h2s(select = "h2 > a", attr = "href")]
url: String,
#[h2s(select = "ul > li")]
tags: Vec<String>,
#[h2s(select = "ul > li:nth-child(1)")]
first_tag: Option<String>,
}
let html = r#"
<html lang="en">
<body>
<div>
<h1 class="blog-title">My tech blog</h1>
<div class="articles">
<div>
<h2><a href="https://example.com/1">article1</a></h2>
<div><span>901</span> Views</div>
<ul><li>Tag1</li><li>Tag2</li></ul>
</div>
<div>
<h2><a href="https://example.com/2">article2</a></h2>
<div><span>849</span> Views</div>
<ul></ul>
</div>
<div>
<h2><a href="https://example.com/3">article3</a></h2>
<div><span>103</span> Views</div>
<ul><li>Tag3</li></ul>
</div>
</div>
</div>
</body>
</html>
"#;
let page = h2s::parse::<Page>(html).unwrap();
assert_eq!(page, Page {
lang: "en".to_string(),
blog_title: "My tech blog".to_string(),
articles: vec![
Article {
title: "article1".to_string(),
url: "https://example.com/1".to_string(),
view_count: 901,
tags: vec!["Tag1".to_string(), "Tag2".to_string()],
first_tag: Some("Tag1".to_string()),
},
Article {
title: "article2".to_string(),
url: "https://example.com/2".to_string(),
view_count: 849,
tags: vec![],
first_tag: None,
},
Article {
title: "article3".to_string(),
url: "https://example.com/3".to_string(),
view_count: 103,
tags: vec!["Tag3".to_string()],
first_tag: Some("Tag3".to_string()),
},
]
});
// When the input HTML document structure does not match the expected,
// `h2s::parse` will return an error with a detailed reason.
let invalid_html = html.replace(r#"<a href="https://example.com/3">article3</a>"#, "");
let err = h2s::parse::<Page>(invalid_html).unwrap_err();
assert_eq!(
err.to_string(),
"articles: [2]: title: mismatched number of selected elements by \"h2 > a\": expected exactly one element, but no elements found"
);
§Supported types
You can use the following types as a field value of the struct to parse.
§Basic types
String
- Numeric types (
usize
,i64
,NonZeroU32
, … ) - And more built-in supported types (List)
- Or you can use any types by implementing yourself (Example)
§Container types (where T
is a basic type)
[T;N]
Option<T>
Vec<T>
Modules§
- backend
- You can select backend HTML parser library to use, or you can also implement custom backend by yourself.
- display
- All implementations of Display trait. Defining human-readable string is a different context from HTML-parsing process, so separate it and aggregate implementations here
- element_
selector - error
- Implementations of
std::error::Error
- extraction_
method - field_
value - functor
- html
- macro_
utils - A set of internal utility methods that will be used in the auto-generated code on
FromHtml
derive macro. These methods are shorthands to reduce codes in thequote!
macro and improve development experience. If you are just a h2s user, you wouldn’t call these methods directly. - parseable
- Implementations of
FromHtml
trait - transformable
- traversable
- traversable_
with_ context
Structs§
Enums§
- Never
- Similar with std::convert::Infallible
Traits§
- From
Html - A converter from single HTML element to single struct
Functions§
- parse
- A shorthand method without specifying backend HTML parser
- parse_
with_ backend - Parsing with specific backend HTML parser