Skip to main content

Module html

Module html 

Source
Expand description

Error-tolerant HTML parser.

This module implements an error-tolerant HTML 4.01 parser, similar to libxml2’s HTMLparser.c. Unlike the strict XML parser, this parser handles common HTML patterns that are technically malformed:

  • Missing closing tags (auto-closed based on HTML content model rules)
  • Unquoted attribute values (<div class=main>)
  • Void elements that never need closing (<br>, <img>, <hr>, etc.)
  • Case-insensitive tag name matching
  • Bare & characters (not just &amp;)
  • Missing doctype
  • Boolean attributes without values (<input disabled>)

The parser produces the same Document tree structure as the XML parser.

§Examples

use xmloxide::html::parse_html;

let doc = parse_html("<p>Hello <b>world</b>").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("html"));

Modules§

entities
HTML named character references.

Structs§

HtmlParseOptions
Options controlling HTML parser behavior.

Functions§

parse_html
Parses an HTML string into a Document with default options.
parse_html_with_options
Parses an HTML string into a Document with the given options.