Skip to main content

HtmlCleaner

Struct HtmlCleaner 

Source
pub struct HtmlCleaner { /* private fields */ }
Expand description

HTML cleaning utility.

Provides methods for removing, stripping, and normalizing HTML elements.

§Example

use html_cleaning::{HtmlCleaner, CleaningOptions};
use dom_query::Document;

let options = CleaningOptions {
    tags_to_remove: vec!["script".into(), "style".into()],
    prune_empty: true,
    ..Default::default()
};

let cleaner = HtmlCleaner::with_options(options);
let doc = Document::from("<div><script>x</script><p>Hello</p></div>");
cleaner.clean(&doc);
assert!(doc.select("script").is_empty());

Implementations§

Source§

impl HtmlCleaner

Source

pub fn new() -> Self

Create a cleaner with default options.

Source

pub fn with_options(options: CleaningOptions) -> Self

Create a cleaner with custom options.

Source

pub fn options(&self) -> &CleaningOptions

Get a reference to the current options.

Source

pub fn clean(&self, doc: &Document)

Apply all configured cleaning operations to the document.

Operations are applied in this order:

  1. Remove tags (with children)
  2. Strip tags (keep children)
  3. Remove by CSS selector
  4. Prune empty elements
  5. Normalize whitespace
  6. Clean attributes
Source

pub fn remove_tags(&self, doc: &Document, tags: &[&str])

Remove elements matching tags (including all children).

§Example
use html_cleaning::HtmlCleaner;
use dom_query::Document;

let cleaner = HtmlCleaner::new();
let doc = Document::from("<div><script>bad</script><p>good</p></div>");
cleaner.remove_tags(&doc, &["script"]);
assert!(doc.select("script").is_empty());
Source

pub fn strip_tags(&self, doc: &Document, tags: &[&str])

Strip tags but preserve their children.

The tag wrapper is removed but inner content (text and child elements) is moved to the parent.

§Example
use html_cleaning::HtmlCleaner;
use dom_query::Document;

let cleaner = HtmlCleaner::new();
let doc = Document::from("<div><span>text</span></div>");
cleaner.strip_tags(&doc, &["span"]);
assert!(doc.select("span").is_empty());
Source

pub fn remove_by_selector(&self, doc: &Document, selector: &str)

Remove elements matching a CSS selector.

§Example
use html_cleaning::HtmlCleaner;
use dom_query::Document;

let cleaner = HtmlCleaner::new();
let doc = Document::from(r#"<div class="ad">Ad</div><p>Content</p>"#);
cleaner.remove_by_selector(&doc, ".ad");
assert!(doc.select(".ad").is_empty());
Source

pub fn prune_empty(&self, doc: &Document)

Remove empty elements.

Elements are considered empty if they:

  • Have no child elements
  • Have no text content (or only whitespace)

Processes in reverse document order (children before parents).

Source

pub fn normalize_text(&self, doc: &Document)

Normalize text nodes (trim, collapse whitespace).

Walks all text nodes and collapses multiple whitespace to single space.

Source

pub fn clean_attributes(&self, doc: &Document)

Remove or filter attributes from all elements.

If strip_attributes is true in options:

  • Removes all attributes except those in preserved_attributes

Trait Implementations§

Source§

impl Clone for HtmlCleaner

Source§

fn clone(&self) -> HtmlCleaner

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for HtmlCleaner

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for HtmlCleaner

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.