Html parser
A simple and general purpose html/xhtml parser lib/bin, using Pest.
Features
- Parse html & xhtml (not xml processing instructions)
- Parse html-documents
- Parse html-fragments
- Parse empty documents
- Parse with the same api for both documents and fragments
- Parse custom, non-standard, elements;
<cat/>
,<Cat/>
and<C4-t/>
- Removes comments
- Removes dangling elements
- Iterate over all nodes in the dom three
What is it not
- It's not a high-performance browser-grade parser
- It's not suitable for html validation
- It's not a parser that includes element selection or dom manipulation
If your requirements matches any of the above, then you're most likely looking for one of the crates below:
Examples bin
Parse html file
html_parser index.html
Parse stdin with pretty output
curl <website> | html_parser -p
Examples lib
Parse html document
use Dom;
Parse html fragment
use Dom;
Print to json
use ;