Dompa
A lightweight, zero-dependency HTML5 document parser for Rust. Dompa takes an HTML string as input, parses it into a node tree, and provides an API for querying, manipulating, and serializing the node tree back to HTML.
Installation
Add Dompa to your Cargo.toml:
[]
= "1.0.0"
Usage
Basic usage looks like this:
use dompa;
Node Types
Dompa defines four types of nodes that represent HTML elements:
BlockNode
Represents standard HTML elements that can contain children:
A BlockNode can be created either by using the full, verbose way, such as:
use ;
use HashMap;
let block_node = Block;
Or you can use a shorthand helper, like so:
let block_node = block;
But if you don't care about manually adding attributes then you can use an even shorter shorthand helper, like so:
let block_node = simple_block;
Example with content:
use Node;
use HashMap;
// Create a div with text content
let div = simple_block;
VoidNode
Represents self-closing HTML elements that cannot have children. Dompa automatically treats the following tags as void nodes:
!doctypeareabasebrcolembedhrimginputlinkmetasourcetrackwbr
A VoidNode can be created either by using the full, verbose way, such as:
use ;
use HashMap;
let void_node = Void;
Or you can use a shorthand helper, like so:
let void_node = void;
But if you don't care about manually adding attributes then you can use an even shorter shorthand helper, like so:
let void_node = simple_void;
Example with attributes:
use ;
use HashMap;
// Create an img element with attributes
let mut attrs = new;
attrs.insert;
attrs.insert;
let img = void;
TextNode
Represents plain text content inside HTML elements:
A TextNode can be created either by using the full, verbose way, such as:
use ;
let text_node = Text;
Or you can use a shorthand helper, like so:
use Node;
let text_node = text;
FragmentNode
A special node type that allows grouping multiple nodes without creating a parent element:
A FragmentNode can be created either by using the full, verbose way, such as:
use ;
let fragment_node = Fragment;
Or you can use a shorthand helper, like so:
use Node;
let fragment_node = fragment;
Essentially, a FragmentNode is a node which children replace itself.
HTML Parsing and Manipulation
nodes function
The nodes function parses an HTML string into a node tree:
let html = Stringfrom;
let nodes = nodes;
traverse function
The traverse function allows you to manipulate the node tree by applying a callback function to each node:
let html = Stringfrom;
let nodes = nodes;
// Update the title text
let updated_nodes = traverse;
The callback function must return an Option<Node>:
- If you return
Nonefor a node, it will be removed from the tree - If you return
Some(node), that node will be kept in the tree, whether it's the original or a replacement - For nodes you don't want to modify, you must return the original node wrapped in
Some()(typicallySome(node.clone())) - For nodes you want to modify, return a new or updated node wrapped in
Some()
Note that the callback function is called for every node in the tree, so you need to handle all cases. In most cases, you'll have specific patterns you want to match and transform, and then a catch-all case that returns the original node.
to_html function
The to_html function serializes the node tree back to an HTML string:
let html = to_html;
Note that since the attributes are stored in a HashMap, their order is not guaranteed to be the same as in your HTML. However, to not have
unpredictable results, Dompa sorts the attributes alphabetically in the output.
Working with Attributes
Attributes are stored in a HashMap with string keys. Attribute values can be either String values or a boolean true:
use ;
use HashMap;
let mut attrs = new;
// String attribute
attrs.insert;
// Boolean attribute (present without value)
attrs.insert;
let anchor = block;
Dompa provides a convenient helper method for string attributes:
// Instead of:
String
// You can use:
string