pub fn clean_html_fragment(
html_fragment: &str,
base_url: &Url,
) -> Result<CleanedHtml, FullTextParserError>Expand description
Re-use crate internals to clean HTML fragments of articles before further processing:
- replace H1 with H2
- rename all font nodes to span
- unwrap noscript images
- strip noscript tags
- fix lazy-load images
- fix iframe size
- remove onclick of links
- strip elements using Readability.com and Instapaper.com ignore class names
- strip elements that contain style=“display: none;”
- strip styles
- strip input elements
- strip comments
- strip scripts
- strip all external css and fonts
- complete relative urls
- simplify nested elements
§Arguments
html- HTML contentbase_url- URL used to complete relative URLs