htmlite 0.12.0

An HTML manipulation toolkit
Documentation
# htmlite

_An HTML manipulation toolkit_

`htmlite` is lightweight html toolkit for parsing, manipulating and generating HTML.

## Examples

**Parsing a fragment of html**

```rust
use htmlite::NodeArena;
let arena = NodeArena::new();
htmlite::parse(&arena, "<h1>Hello, <i>world!</i></h1>").unwrap();
```

**Selecting elements**

```rust
use htmlite::{NodeArena, Node};
let html = r#"
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
"#;

let arena = NodeArena::new();
let root = htmlite::parse(&arena, html).unwrap();

for element in root.descendants().select("li") {
    assert_eq!(&*element.name(), "li");
}
```

**Accessing element attributes**

```rust
use htmlite::{NodeArena, Node};
let arena = NodeArena::new();
let root = htmlite::parse(&arena, r#"<input name="foo" value="bar" readonly>"#).unwrap();

let element = root.descendants().select(r#"input[name="foo"]"#).next().unwrap();
assert_eq!(element.attr("value").as_deref(), Some("bar"));
assert_eq!(element.attr("readonly").as_deref(), Some(""));
```

**Serializing HTML and inner HTML**

```rust
use htmlite::{NodeArena};
let arena = NodeArena::new();
let root = htmlite::parse(&arena, "<h1>Hello, <i>world!</i></h1>").unwrap();
let h1 = root.descendants().select("h1").next().unwrap();
assert_eq!(h1.html(), "<h1>Hello, <i>world!</i></h1>");
assert_eq!(h1.inner_html(), "Hello, <i>world!</i>");
```

**Manipulating the DOM**

```rust
use htmlite::{NodeArena};
let html = "<html><body>hello<p class=\"hello\">REMOVE ME</p></body></html>";
let arena = NodeArena::new();
let root = htmlite::parse(&arena, html).unwrap();
for el in root.descendants().select(".hello") {
    el.detach();
}
assert_eq!(root.html(), "<html><body>hello</body></html>")
```

**Generating HTML**

```rust
let h = htmlite::NodeArena::new();
let form = h.form(
    [("method", "POST")],
    [
        h.input([("value", "hello"), ("type", "text")], None),
        h.button(None, h.text("Submit"))
    ]
);
assert_eq!(form.html(), r#"<form method="POST"><input value="hello" type="text"><button>Submit</button></form>"#);
```

## When should you use this?

This is not a "browser-grade" HTML parser, but it is close!

Specifically, the tokenizer is spec compliant and passes all the [html5lib tokenizer tests](https://github.com/html5lib/html5lib-tests/tree/master/tokenizer).
So `htmlite` will accept any valid HTML "construct" like numeric & named character references and void elements.

However, the tree-builder does not follow the spec.
This was done on purpose.
A spec compliant tree-builder may restructure your markup for multitude of reasons: badly nested tags, child elements that don't conform to the content model of their parent, missing end tags etc ...
The tree-builder in this library takes a simpler approach: it will parse any well-balanced HTML and output a tree that corresponds to that markup, exactly as written.

So this library will work well when you are parsing the output of HTML-generating tools like SSGs or markdown parser.
Tools like these don't forget to add end tags :)

On the other hand, parsing random web content is more of a gamble.
For example, many sites rely on the fact that you do not need to close your `<p>` tags. 
This library will fail on such markup.

TLDR; If your HTML looks like well-formed XML if you squint, this library's HTML parser is for you.

## Adjacent crates

[scraper](https://crates.io/crates/scraper): An inspiration for this crate. Uses [html5ever](https://crates.io/crates/html5ever/). You get browser-grade html parsing with a browser-grade dependency tree.

[kuchiki](https://crates.io/crates/kuchiki): As far as I understand this was the predecessor to scraper. Same thing about html5ever.

[tl](https://crates.io/crates/tl):
A bit too lenient, while also [failing](https://github.com/y21/tl/issues/70) on valid html.
Additionally it does some [weird error recovery](https://github.com/y21/tl/issues/65) that I did not want.

[html5gum](https://crates.io/crates/html5gum): Only tokenizes. I _could_ have used this instead of writing my own tokenizer ... but where is the fun in that.

[lol-html](https://crates.io/crates/lol_html): Very odd API. A bit too dependency heavy for my liking. Different use case

## Thank you

This crate would not be possible without SimonSapin's [rust-forest](https://github.com/SimonSapin/rust-forest) experiment.
The combination of using an [Arena allocator and Cell-wrapped references](https://github.com/SimonSapin/rust-forest/blob/master/arena-tree/lib.rs) is at the root of why this API is as ergonomic as it is.
Brilliant design. Thank you for you work!