slim

Function slim 

Source
pub fn slim(html_content: &str) -> Result<String>
Expand description

Strips non-content elements from the provided HTML content using the scraper crate, preserving essential head tags, and returns the cleaned HTML as a string.

This function aims to replicate the behavior of slimmer::slim using scraper. It removes:

  • Non-visible tags like <script>, <link>, <style>, <svg>, <base>.
  • HTML comments.
  • Empty or whitespace-only text nodes.
  • Specific tags (like <div>, <span>, <p>, etc.) if they become effectively empty after processing children.
  • Attributes except for specific allowlists (class, aria-label, href outside head; property, content for relevant meta tags in head).

It preserves:

  • <title> tag within <head>.
  • <meta> tags within <head> if their property attribute matches keywords in META_PROPERTY_KEYWORDS.
  • Essential body content.

§Arguments

  • html_content - A string slice containing the HTML content to be processed.

§Returns

A Result<String> which is:

  • Ok(String) containing the cleaned HTML content.
  • Err if any errors occur during processing.
Examples found in repository?
examples/c01-simple.rs (line 15)
1fn main() -> Result<(), Box<dyn std::error::Error>> {
2	let fx_html = r#"
3	<!DOCTYPE html>
4	<html>
5	<head>
6		<meta charset="utf-8">
7		<link rel="icon" href="favicon.ico">
8	</head>
9	<body>
10		<p>Content</p>
11	</body>
12	</html>
13	"#;
14
15	let slim = html_helpers::slim(fx_html)?;
16
17	println!("Slim:\n\n{slim}");
18
19	Ok(())
20}