Function slim

Source
pub fn slim(html_content: &str) -> Result<String>
Expand description

Strips non-content elements from the provided HTML content, preserving essential head tags, and returns the cleaned HTML as a string.

This function removes:

  • Non-visible tags like <script>, <link>, <style>, <svg>, <base>.
  • HTML comments.
  • Empty or whitespace-only text nodes.
  • Specific tags (like <div>, <span>, <p>, etc.) if they become effectively empty after processing children.
  • Attributes except for specific allowlists (class, aria-label, href outside head; property, content for relevant meta tags in head).

It preserves:

  • <title> tag within <head>.
  • <meta> tags within <head> if their property attribute matches keywords in META_PROPERTY_KEYWORDS.
  • Essential body content.

§Arguments

  • html_content - A string slice containing the HTML content to be processed.

§Returns

A Result<String> which is:

  • Ok(String) containing the cleaned HTML content.
  • Err if any parsing or serialization errors occur.