Skip to main content

ContentAnalysis

Struct ContentAnalysis 

Source
pub struct ContentAnalysis {
Show 19 fields pub is_thin_content: bool, pub has_visual_elements: bool, pub has_dynamic_content: bool, pub needs_screenshot: bool, pub iframe_count: usize, pub video_count: usize, pub canvas_count: usize, pub embed_count: usize, pub svg_count: usize, pub text_length: usize, pub html_length: usize, pub text_ratio: f32, pub svg_bytes: usize, pub script_bytes: usize, pub style_bytes: usize, pub base64_bytes: usize, pub cleanable_bytes: usize, pub cleanable_ratio: f32, pub indicators: Vec<String>,
}
Expand description

Result of analyzing HTML content.

Helps decide whether to rely on HTML text alone or require a screenshot for accurate extraction.

Fields§

§is_thin_content: bool

Whether the content is “thin” (low text content).

§has_visual_elements: bool

Whether visual elements that need screenshot were detected.

§has_dynamic_content: bool

Whether dynamic content indicators were found.

§needs_screenshot: bool

Recommendation: true if screenshot is recommended.

§iframe_count: usize

Count of iframe elements.

§video_count: usize

Count of video elements.

§canvas_count: usize

Count of canvas elements.

§embed_count: usize

Count of embed/object elements.

§svg_count: usize

Count of SVG elements.

§text_length: usize

Approximate visible text length.

§html_length: usize

Total HTML length.

§text_ratio: f32

Ratio of text to HTML.

§svg_bytes: usize

Total bytes of SVG elements.

§script_bytes: usize

Total bytes of script elements.

§style_bytes: usize

Total bytes of style elements.

§base64_bytes: usize

Total bytes of base64-encoded data.

§cleanable_bytes: usize

Total bytes that could be cleaned.

§cleanable_ratio: f32

Ratio of cleanable bytes to total.

§indicators: Vec<String>

Indicators found (for debugging).

Implementations§

Source§

impl ContentAnalysis

Source

pub fn analyze(html: &str) -> Self

Analyze HTML content (fast mode).

Source

pub fn analyze_full(html: &str) -> Self

Analyze HTML content with full byte size calculation.

Source

pub fn quick_needs_screenshot(html: &str) -> bool

Quick check if screenshot is needed (inline, no full analysis).

Uses Aho-Corasick for efficient multi-pattern matching without allocating memory for lowercase conversion.

Source

pub fn has_visual_elements_quick(html: &str) -> bool

Check if HTML has any visual elements (iframe, video, canvas, embed, object).

Source

pub fn recommended_cleaning(&self) -> HtmlCleaningProfile

Get recommended cleaning profile based on analysis.

Source

pub fn summary(&self) -> String

Get a summary string.

Trait Implementations§

Source§

impl Clone for ContentAnalysis

Source§

fn clone(&self) -> ContentAnalysis

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ContentAnalysis

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for ContentAnalysis

Source§

fn default() -> ContentAnalysis

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for ContentAnalysis

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for ContentAnalysis

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,