HTMLFixinator
A Rust library for cleaning and transforming HTML content through a composable filter system. HTMLFixinator provides a set of filters that can be used individually or chained together to modify HTML documents.
Features
- ๐ Attribute Filter: Remove specific HTML attributes while preserving others
- ๐๏ธ Comment Filter: Strip HTML comments from the document
- ๐ฆ Element Filter: Remove or unwrap specific HTML elements
- ๐งน Empty Filter: Remove empty elements while preserving non-empty ones
- ๐ URL Filter: Convert relative URLs to absolute URLs
- โ๏ธ Filter Chain: Combine multiple filters for complex transformations
- ๐ฏ Case-insensitive: All filters work case-insensitively for robustness
Installation
Add this to your Cargo.toml
:
[]
= { = "0.1.0" }
Or run this command:
Usage
Basic Example
use ;
// Create a filter to remove class and style attributes
let filter = attribute;
// Apply the filter to some HTML
let html = r#"<div class="test" style="color: red;">Content</div>"#;
let doc = string_to_node;
let result = filter.apply;
assert_eq!;
Chaining Filters
use ;
// Create a chain of filters
let chain = new
.add // Remove comments
.add // Remove empty elements
.add; // Remove class attributes
let html = r#"<!-- Comment --><div class="test"><span></span><p>Content</p></div>"#;
let doc = string_to_node;
let result = chain.apply;
assert_eq!;
Available Filters
AttributeFilter
Removes specified attributes from all elements.
use Filter;
let filter = attribute;
CommentFilter
Removes all HTML comments from the document.
use Filter;
let filter = comment;
ElementFilter
Either removes elements completely or unwraps them (removes the element but keeps its content).
use Filter;
// Remove mode
let filter = element;
// Unwrap mode
let filter = element;
EmptyFilter
Removes elements that have no content (preserves elements with <img>
tags).
use Filter;
let filter = empty;
RelativeToAbsoluteFilter
Converts relative URLs in href attributes to absolute URLs.
use Filter;
License
This project is licensed under the GNU Lesser General Public License.