pub fn slim(html_content: &str) -> Result<String>Expand description
Strips non-content elements from the provided HTML content using the scraper crate,
preserving essential head tags, and returns the cleaned HTML as a string.
This function aims to replicate the behavior of slimmer::slim using scraper.
It removes:
- Non-visible tags like
<script>,<link>,<style>,<svg>,<base>. - HTML comments.
- Empty or whitespace-only text nodes.
- Specific tags (like
<div>,<span>,<p>, etc.) if they become effectively empty after processing children. - Attributes except for specific allowlists (
class,aria-label,hrefoutside head;property,contentfor relevant meta tags in head).
It preserves:
<title>tag within<head>.<meta>tags within<head>if theirpropertyattribute matches keywords inMETA_PROPERTY_KEYWORDS.- Essential body content.
§Arguments
html_content- A string slice containing the HTML content to be processed.
§Returns
A Result<String> which is:
Ok(String)containing the cleaned HTML content.Errif any errors occur during processing.
Examples found in repository?
examples/c01-simple.rs (line 15)
1fn main() -> Result<(), Box<dyn std::error::Error>> {
2 let fx_html = r#"
3 <!DOCTYPE html>
4 <html>
5 <head>
6 <meta charset="utf-8">
7 <link rel="icon" href="favicon.ico">
8 </head>
9 <body>
10 <p>Content</p>
11 </body>
12 </html>
13 "#;
14
15 let slim = html_helpers::slim(fx_html)?;
16
17 println!("Slim:\n\n{slim}");
18
19 Ok(())
20}