1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
//! [![github]](https://github.com/mathiversen/html-parser)
//!
//! [github]: https://img.shields.io/badge/github-8da0cb?style=for-the-badge&labelColor=555555&logo=github
//!
//! # Html parser
//!
//! **WIP - work in progress, use at your own risk**
//!
//! A simple and general purpose html/xhtml parser, using [Pest](https://pest.rs/).
//!
//! ## Features
//! - Parse html & xhtml (not xml processing instructions)
//! - Parse html-documents
//! - Parse html-fragments
//! - Parse empty documents
//! - Parse with the same api for both documents and fragments
//! - Parse custom, non-standard, elements; `<cat/>`, `<Cat/>` and `<C4-t/>` are all ok!
//! - Removes comments
//! - Removes dangling elements
//!
//! ## What is it not
//!
//! - It's not a high-performance browser-grade parser
//! - It's not suitable for html validation
//! - It's not a parser that includes element selection or dom manipulation
//!
//! If your requirements matches any of the above, then you're most likely looking for one of the crates below:
//!
//! - [html5ever](https://crates.io/crates/html5ever)
//! - [kuchiki](https://crates.io/crates/kuchiki)
//! - [scraper](https://crates.io/crates/scraper)
//! - or other crates using the `html5ever` parser
//!
//! ## Examples
//! Parse html document
//!
//! ```rust
//!     use html_parser::Dom;
//!
//!     fn main() {
//!         let html = r#"
//!             <!doctype html>
//!             <html lang="en">
//!                 <head>
//!                     <meta charset="utf-8">
//!                     <title>Html parser</title>
//!                 </head>
//!                 <body>
//!                     <h1 id="a" class="b c">Hello world</h1>
//!                     </h1> <!-- comments & dangling elements are ignored -->
//!                 </body>
//!             </html>"#;
//!
//!         assert!(Dom::parse(html).is_ok());
//!     }
//! ```
//!
//! Parse html fragment
//!
//! ```rust
//!     use html_parser::Dom;
//!
//!     fn main() {
//!         let html = "<div id=cat />";
//!         assert!(Dom::parse(html).is_ok());
//!     }
//! ```
//!
//! Print to json
//!
//! ```rust
//!     use html_parser::{Dom, Result};
//!
//!     fn main() -> Result<()> {
//!         let html = "<div id=cat />";
//!         let json = Dom::parse(html)?.to_json_pretty()?;
//!         println!("{}", json);
//!         Ok(())
//!     }
//! ```
//!
//! ## Contributions
//! I would love to get some feedback if you find my little project useful. Please feel free to highlight issues with my code or submit a PR in case you want to improve it.

#![allow(clippy::needless_doctest_main)]

mod dom;
mod error;
mod grammar;

use grammar::Rule;

pub use crate::dom::element::{Element, ElementVariant};
pub use crate::dom::node::Node;
pub use crate::dom::Dom;
pub use crate::dom::DomVariant;
pub use crate::error::Error;
pub use anyhow::Result;