htmlite

An HTML manipulation toolkit

htmlite is lightweight html toolkit for parsing, manipulating and generating HTML.

Examples

Parsing a fragment of html

use htmlite::NodeArena;
let arena = NodeArena::new();
htmlite::parse(&arena, "<h1>Hello, <i>world!</i></h1>").unwrap();

Selecting elements

use htmlite::{NodeArena, Node};
let html = r#"
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
"#;

let arena = NodeArena::new();
let root = htmlite::parse(&arena, html).unwrap();

for element in root.descendants().select("li") {
    assert_eq!(&*element.name(), "li");
}

Accessing element attributes

use htmlite::{NodeArena, Node};
let arena = NodeArena::new();
let root = htmlite::parse(&arena, r#"<input name="foo" value="bar" readonly>"#).unwrap();

let element = root.descendants().select(r#"input[name="foo"]"#).next().unwrap();
assert_eq!(element.attr("value").as_deref(), Some("bar"));
assert_eq!(element.attr("readonly").as_deref(), Some(""));

Serializing HTML and inner HTML

use htmlite::{NodeArena};
let arena = NodeArena::new();
let root = htmlite::parse(&arena, "<h1>Hello, <i>world!</i></h1>").unwrap();
let h1 = root.descendants().select("h1").next().unwrap();
assert_eq!(h1.html(), "<h1>Hello, <i>world!</i></h1>");
assert_eq!(h1.inner_html(), "Hello, <i>world!</i>");

Manipulating the DOM

use htmlite::{NodeArena};
let html = "<html><body>hello<p class=\"hello\">REMOVE ME</p></body></html>";
let arena = NodeArena::new();
let root = htmlite::parse(&arena, html).unwrap();
for el in root.descendants().select(".hello") {
    el.detach();
}
assert_eq!(root.html(), "<html><body>hello</body></html>")

Generating HTML

let h = htmlite::NodeArena::new();
let form = h.form(
    [("method", "POST")],
    [
        h.input([("value", "hello"), ("type", "text")], None),
        h.button(None, h.text("Submit"))
    ]
);
assert_eq!(form.html(), r#"<form method="POST"><input value="hello" type="text"><button>Submit</button></form>"#);

When should you use this?

This is not a "browser-grade" HTML parser, but it is close!

Specifically, the tokenizer is spec compliant and passes all the html5lib tokenizer tests. So htmlite will accept any valid HTML "construct" like numeric & named character references and void elements.

However, the tree-builder does not follow the spec. This was done on purpose. A spec compliant tree-builder may restructure your markup for multitude of reasons: badly nested tags, child elements that don't conform to the content model of their parent, missing end tags etc ... The tree-builder in this library takes a simpler approach: it will parse any well-balanced HTML and output a tree that corresponds to that markup, exactly as written.

So this library will work well when you are parsing the output of HTML-generating tools like SSGs or markdown parser. Tools like these don't forget to add end tags :)

On the other hand, parsing random web content is more of a gamble. For example, many sites rely on the fact that you do not need to close your <p> tags. This library will fail on such markup.

TLDR; If your HTML looks like well-formed XML if you squint, this library's HTML parser is for you.

Adjacent crates

scraper: An inspiration for this crate. Uses html5ever. You get browser-grade html parsing with a browser-grade dependency tree.

kuchiki: As far as I understand this was the predecessor to scraper. Same thing about html5ever.

tl: A bit too lenient, while also failing on valid html. Additionally it does some weird error recovery that I did not want.

html5gum: Only tokenizes. I could have used this instead of writing my own tokenizer ... but where is the fun in that.

lol-html: Very odd API. A bit too dependency heavy for my liking. Different use case

Thank you

This crate would not be possible without SimonSapin's rust-forest experiment. The combination of using an Arena allocator and Cell-wrapped references is at the root of why this API is as ergonomic as it is. Brilliant design. Thank you for you work!

htmlite 0.12.0

htmlite

Examples

When should you use this?

Adjacent crates

Thank you