Expand description
§Spider Agent HTML
HTML processing utilities for spider_agent — cleaning, content analysis integration, and diffing.
This crate provides the HTML cleaning functions extracted from spider_agent.
Uses lol_html for fast, streaming HTML rewriting.
§Dependencies
lol_html— streaming HTML rewriteraho-corasick— pattern matching (via spider_agent_types)spider_agent_types— type definitions
Functions§
- clean_
html - Default cleaner (base level).
- clean_
html_ base - Clean the HTML removing CSS and JS (base level).
- clean_
html_ full - Full/aggressive HTML cleaning.
- clean_
html_ raw - Raw passthrough - no cleaning.
- clean_
html_ slim - Slim HTML cleaning - removes heavy elements.
- clean_
html_ with_ profile - Clean HTML using a specific profile.
- clean_
html_ with_ profile_ and_ intent - Clean HTML with a specific profile and intent.
- smart_
clean_ html - Smart HTML cleaner that automatically determines the best cleaning level.