Expand description
DOM Content Processing Utilities
This module provides high-performance utilities for processing raw HTML content, extracting clean text, and normalizing web page content for downstream consumption.
§Features
- HTML Cleaning: Remove scripts, styles, and other non-content elements
- Text Extraction: Convert HTML to clean, readable text
- Entity Decoding: Properly decode HTML entities
- Whitespace Normalization: Clean up excessive whitespace while preserving structure
- Truncation: Intelligently truncate content with ellipsis
§Example
use reasonkit_web::processing::{ContentProcessor, ContentProcessorConfig};
let config = ContentProcessorConfig::default();
let processor = ContentProcessor::new(config);
let html = r#"<html><head><script>evil();</script></head>
<body><p>Hello & welcome!</p></body></html>"#;
let result = processor.process(html);
assert!(result.text.contains("Hello & welcome!"));
assert!(!result.text.contains("evil"));Structs§
- Content
Processor - Content processor for HTML documents
- Content
Processor Config - Configuration for the content processor
- Processed
Content - Result of content processing