Expand description
![github] [github]: https://img.shields.io/badge/github-8da0cb?style=for-the-badge&labelColor=555555&logo=github
§lithtml
A lightweight and fast HTML/XHTML parser for Rust, designed to handle both full HTML documents and fragments. This parser uses Pest for parsing and is forked from html-parser.
§Features
- Parse html & xhtml (not xml processing instructions)
- Parse html-documents
- Parse html-fragments
- Parse empty documents
- Parse with the same api for both documents and fragments
- Parse custom, non-standard, elements;
<cat/>
,<Cat/>
and<C4-t/>
- Removes comments
- Removes dangling elements
- Iterate over all nodes in the dom three
§Examples
Parse html document and print as json & formatted dom
use lithtml::Dom;
fn main() {
let html = r#"
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Html parser</title>
</head>
<body>
<h1 id="a" class="b c">Hello world</h1>
</h1> <!-- comments & dangling elements are ignored -->
</body>
</html>"#;
let dom = Dom::parse(html).unwrap();
println!("{}", dom.to_json_pretty().unwrap());
println!("{}", dom);
}
Parse html fragment and print as json & formatted fragment
use lithtml::Dom;
fn main() {
let html = "<div id=cat />";
let dom = Dom::parse(html).unwrap();
println!("{}", dom.to_json_pretty().unwrap());
println!("{}", dom);
}
Print to json
use lithtml::{Dom, Result};
fn main() -> Result<()> {
let html = "<div id=cat />";
let json = Dom::parse(html)?.to_json_pretty()?;
println!("{}", Dom::parse(html)?);
Ok(())
}
Create a dom manually and print it
use lithtml::{Dom, Node, Result};
fn main() -> Result<()> {
let mut dom = Dom::new();
dom.children.push(Node::new_comment("Welcome to the test"));
dom.children.push(Node::parse_json(
r#"{
"name": "div",
"variant": "normal",
"children": [
{
"name": "h1",
"variant": "normal",
"children": [
"Tjena världen!"
]
},
{
"name": "p",
"variant": "normal",
"children": [
"Tänkte bara informera om att Sverige är bättre än Finland i ishockey."
]
}
]
}"#
)?);
dom.children.append(&mut Node::parse(
r#"<div>Testing</div><p>Multiple elements from node</p>"#,
)?);
println!("{}", dom);
Ok(())
}
Structs§
- Dom
- The main struct & the result of the parsed html
- Element
- Most of the parsed html nodes are elements, except for text
- Formatting
Options
Enums§
- DomVariant
- Document, DocumentFragment or Empty
- Element
Variant - Normal:
<div></div>
or Void:<meta/>
and<meta>
- Error
- Node