Crate tl[−][src]
Expand description
tl
tl is a very fast, zero-copy HTML parser written in pure Rust.
Design
Due to the zero-copy nature of parsers, the string must be kept alive for the entire lifetime of the parser/dom.
If this is not acceptable or simply not possible in your case, you can call tl::parse_owned().
This goes through the same steps as tl::parse() but returns an OwnedVDom instead of a VDom.
The difference is that OwnedVDom carefully creates a self-referential struct in which it stores the input string, so you can keep the OwnedVDom as long as you want and move it around as much as you want.
InlineVec and InlineHashMap
InlineVec is a wrapper around an array/vec capable of storing a small number of elements on the stack, and if it becomes too large it will move all elements to the heap.
This library makes use of this concept in multiple places. “Children” of nodes (subnodes) are stored in an InlineVec, because it turns out that very often HTML tags don’t actually have that many direct subnodes. This has the advantage that most of the time, HTML tags are mostly entirely allocated on the stack.
InlineHashMap is very similar to InlineVec, except that for a small collection, it actually stores the (Key, Value) pair in a regular stack-allocated array, and traverses this array when doing a key lookup without doing any sort of hashing. Turns out that hashing keys only pays off for larger collections. If the length is small, hashing may actually be slower than just going through a few elements.
Bytes
Some functions return a Bytes struct, which is an internal struct that is used to borrow a part of the input string.
This is mainly used over a raw &[u8] for its Debug implementation.
Usage
The main function is tl::parse(). It accepts an HTML source code string and parses it. It is important to note that tl currently silently ignores tags that are invalid, sort of like browsers do. Sometimes, this means that large chunks of the HTML document do not appear in the resulting AST, although in the future this will likely be customizable, in case you need explicit error checking.
Finding an element by its id attribute and printing the inner text:
fn main() {
let input = r#"<p id="text">Hello</p>"#;
let dom = tl::parse(input, tl::ParserOptions::default());
let parser = dom.parser();
let element = dom.get_element_by_id("text")
.expect("Failed to find element")
.get(parser)
.unwrap();
println!("Inner text: {}", element.inner_text(parser));
}Iterating over the subnodes of an HTML document:
fn main() {
let input = r#"<div><img src="cool-image.png" /></div>"#;
let dom = tl::parse(input, tl::ParserOptions::default());
let img = dom.nodes()
.iter()
.find(|node| {
node.as_tag().map_or(false, |tag| tag.name() == &"img".into())
});
println!("{:?}", img);
}Usage
Add tl to your dependencies.
[dependencies]
tl = "0.3.0"Modules
Inline data structures
Structs
Stores all attributes of an HTML tag, as well as additional metadata such as id and class
A wrapper around a DST-slice
Represents a single HTML element
An external handle to a HTML node, originally obtained from a Parser
The main HTML parser
Options for the HTML Parser
VDom represents a Document Object Model
A RAII guarded version of VDom
Enums
HTML Version (<!DOCTYPE>)
An HTML Node
Traits
A trait for converting a type into Bytes
Functions
Parses the given input string
Parses the given input string and returns an owned, RAII guarded DOM