pub fn slim(html_content: &str) -> Result<String>Expand description
Strips non-content elements from the provided HTML content using the scraper crate,
preserving essential head tags, and returns the cleaned HTML as a string.
This function aims to replicate the behavior of slimmer::slim using scraper.
It removes:
- Non-visible tags like <script>,<link>,<style>,<svg>,<base>.
- HTML comments.
- Empty or whitespace-only text nodes.
- Specific tags (like <div>,<span>,<p>, etc.) if they become effectively empty after processing children.
- Attributes except for specific allowlists (class,aria-label,hrefoutside head;property,contentfor relevant meta tags in head).
It preserves:
- <title>tag within- <head>.
- <meta>tags within- <head>if their- propertyattribute matches keywords in- META_PROPERTY_KEYWORDS.
- Essential body content.
§Arguments
- html_content- A string slice containing the HTML content to be processed.
§Returns
A Result<String> which is:
- Ok(String)containing the cleaned HTML content.
- Errif any errors occur during processing.
Examples found in repository?
examples/c01-simple.rs (line 15)
1fn main() -> Result<(), Box<dyn std::error::Error>> {
2	let fx_html = r#"
3	<!DOCTYPE html>
4	<html>
5	<head>
6		<meta charset="utf-8">
7		<link rel="icon" href="favicon.ico">
8	</head>
9	<body>
10		<p>Content</p>
11	</body>
12	</html>
13	"#;
14
15	let slim = html_helpers::slim(fx_html)?;
16
17	println!("Slim:\n\n{slim}");
18
19	Ok(())
20}