# htmlite
_An HTML manipulation toolkit_
`htmlite` is lightweight html toolkit for parsing, manipulating and generating HTML.
## Examples
**Parsing a fragment of html**
```rust
htmlite::parse("<h1>Hello, <i>world!</i></h1>").unwrap();
```
**Selecting elements**
```rust
let html = r#"
<ul>
<li>Foo</li>
<li>Bar</li>
<li>Baz</li>
</ul>
"#;
let root = htmlite::parse(html).unwrap();
for element in htmlite::select("li", root.descendants()) {
assert_eq!(&*element.name(), "li");
}
```
**Accessing element attributes**
```rust
let root = htmlite::parse(r#"<input name="foo" value="bar" readonly>"#).unwrap();
let element = htmlite::select(r#"input[name="foo"]"#, root.descendants()).next().unwrap();
assert_eq!(element.get_attribute("value").as_deref(), Some("bar"));
assert_eq!(element.get_attribute("readonly").as_deref(), Some(""));
```
**Serializing HTML**
```rust
let root = htmlite::parse("<h1>Hello, <i>world!</i></h1>").unwrap();
let h1 = htmlite::select("h1", root.descendants()).next().unwrap();
assert_eq!(h1.html(), "<h1>Hello, <i>world!</i></h1>");
```
**Manipulating the DOM**
```rust
let html = "<html><body>hello<p class=\"hello\">REMOVE ME</p></body></html>";
let root = htmlite::parse(html).unwrap();
for el in htmlite::select(".hello", root.descendants()) {
el.detach();
}
assert_eq!(root.html(), "<html><body>hello</body></html>")
```
**Generating HTML**
```rust
use htmlite::html;
let form = html!(
(form
["method" => "POST"]
(input ["value" => "hello", "type" => "text"])
(button (text "Submit"))
)
);
assert_eq!(form.html(), r#"<form method="POST"><input type="text" value="hello"><button>Submit</button></form>"#);
```
## When should you use this?
This is not a "browser-grade" HTML parser, but it is close!
Specifically, the tokenizer is spec compliant and passes all the [html5lib tokenizer tests](https://github.com/html5lib/html5lib-tests/tree/master/tokenizer).
So `htmlite` will accept any valid HTML "construct" like numeric & named character references and void elements.
However, the tree-builder does not follow the spec.
This was done on purpose.
A spec compliant tree-builder may restructure your markup for multitude of reasons: badly nested tags, child elements that don't conform to the content model of their parent, missing end tags etc ...
The tree-builder in this library takes a simpler approach: it will parse any well-balanced HTML and output a tree that corresponds to that markup, exactly as written.
So this library will work well when you are parsing the output of HTML-generating tools like SSGs or markdown parser.
Tools like these don't forget to add end tags :)
On the other hand, parsing random web content is more of a gamble.
For example, many sites rely on the fact that you do not need to close your `<p>` tags.
This library will fail on such markup.
TLDR; If your HTML looks like well-formed XML if you squint, this library's HTML parser is for you.
## Adjacent crates
[scraper](https://crates.io/crates/scraper): An inspiration for this crate. Uses [html5ever](https://crates.io/crates/html5ever/). You get browser-grade html parsing with a browser-grade dependency tree.
[kuchiki](https://crates.io/crates/kuchiki): As far as I understand this was the predecessor to scraper. Same thing about html5ever.
[tl](https://crates.io/crates/tl):
A bit too lenient, while also [failing](https://github.com/y21/tl/issues/70) on valid html.
Additionally it does some [weird error recovery](https://github.com/y21/tl/issues/65) that I did not want.
[html5gum](https://crates.io/crates/html5gum): Only tokenizes. I _could_ have used this instead of writing my own tokenizer ... but where is the fun in that.
[lol-html](https://crates.io/crates/lol_html): Very odd API. A bit too dependency heavy for my liking. Different use case
## Thank you
This crate would not be possible without SimonSapin's [rust-forest](https://github.com/SimonSapin/rust-forest) experiment.
The combination of using an [Arena allocator and Cell-wrapped references](https://github.com/SimonSapin/rust-forest/blob/master/arena-tree/lib.rs) is at the root of why this API is as ergonomic as it is.
Brilliant design. Thank you for you work!