pub struct ContentAnalysis {Show 19 fields
pub is_thin_content: bool,
pub has_visual_elements: bool,
pub has_dynamic_content: bool,
pub needs_screenshot: bool,
pub iframe_count: usize,
pub video_count: usize,
pub canvas_count: usize,
pub embed_count: usize,
pub svg_count: usize,
pub text_length: usize,
pub html_length: usize,
pub text_ratio: f32,
pub svg_bytes: usize,
pub script_bytes: usize,
pub style_bytes: usize,
pub base64_bytes: usize,
pub cleanable_bytes: usize,
pub cleanable_ratio: f32,
pub indicators: Vec<String>,
}Expand description
Result of analyzing HTML content.
Helps decide whether to rely on HTML text alone or require a screenshot for accurate extraction.
Fields§
§is_thin_content: boolWhether the content is “thin” (low text content).
has_visual_elements: boolWhether visual elements that need screenshot were detected.
has_dynamic_content: boolWhether dynamic content indicators were found.
needs_screenshot: boolRecommendation: true if screenshot is recommended.
iframe_count: usizeCount of iframe elements.
video_count: usizeCount of video elements.
canvas_count: usizeCount of canvas elements.
embed_count: usizeCount of embed/object elements.
svg_count: usizeCount of SVG elements.
text_length: usizeApproximate visible text length.
html_length: usizeTotal HTML length.
text_ratio: f32Ratio of text to HTML.
svg_bytes: usizeTotal bytes of SVG elements.
script_bytes: usizeTotal bytes of script elements.
style_bytes: usizeTotal bytes of style elements.
base64_bytes: usizeTotal bytes of base64-encoded data.
cleanable_bytes: usizeTotal bytes that could be cleaned.
cleanable_ratio: f32Ratio of cleanable bytes to total.
indicators: Vec<String>Indicators found (for debugging).
Implementations§
Source§impl ContentAnalysis
impl ContentAnalysis
Sourcepub fn analyze_full(html: &str) -> Self
pub fn analyze_full(html: &str) -> Self
Analyze HTML content with full byte size calculation.
Sourcepub fn quick_needs_screenshot(html: &str) -> bool
pub fn quick_needs_screenshot(html: &str) -> bool
Quick check if screenshot is needed (inline, no full analysis).
Uses Aho-Corasick for efficient multi-pattern matching without allocating memory for lowercase conversion.
Sourcepub fn has_visual_elements_quick(html: &str) -> bool
pub fn has_visual_elements_quick(html: &str) -> bool
Check if HTML has any visual elements (iframe, video, canvas, embed, object).
Sourcepub fn recommended_cleaning(&self) -> HtmlCleaningProfile
pub fn recommended_cleaning(&self) -> HtmlCleaningProfile
Get recommended cleaning profile based on analysis.
Trait Implementations§
Source§impl Clone for ContentAnalysis
impl Clone for ContentAnalysis
Source§fn clone(&self) -> ContentAnalysis
fn clone(&self) -> ContentAnalysis
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more