Struct Parser

Source

#[non_exhaustive]pub struct Parser {
    pub max_elems_to_parse: usize,
    pub n_top_candidates: usize,
    pub char_threshold: usize,
    pub classes_to_preserve: Vec<String>,
    pub keep_classes: bool,
    pub tags_to_score: Vec<String>,
    pub disable_jsonld: bool,
    pub allowed_video_regex: Option<Regex>,
    /* private fields */
}

Expand description

Port of Parser — the core readability extraction engine.

Create with Parser::new(), configure public fields as needed, then call parse() to extract an article.

A single Parser can be reused for multiple documents — internal state is fully reset at the start of each parse call. However, Parser is not thread-safe: it requires &mut self for parsing, so it cannot be shared across threads without external synchronization.

Fields (Non-exhaustive)§

This struct is marked as non-exhaustive

Non-exhaustive structs could have additional fields added in future. Therefore, non-exhaustive structs cannot be constructed in external crates using the traditional Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.

§max_elems_to_parse: usize

Max DOM nodes to process. 0 = unlimited. Port of MaxElemsToParse.

§n_top_candidates: usize

Number of top candidates to compare during scoring. Port of NTopCandidates.

§char_threshold: usize

Minimum character count for accepted article content. Port of CharThreshold.

§classes_to_preserve: Vec<String>

CSS class names to preserve when keep_classes is false. Port of ClassesToPreserve.

§keep_classes: bool

If true, keep all class attributes. Port of KeepClasses.

§tags_to_score: Vec<String>

Tag names eligible for content scoring. Port of TagsToScore.

§disable_jsonld: bool

Disable JSON-LD metadata extraction. Port of DisableJSONLD.

§allowed_video_regex: Option<Regex>

Optional regex for video URLs to allow. Port of AllowedVideoRegex.

Struct Parser Copy item path

Fields (Non-exhaustive)§

Implementations§

impl Parser

pub fn parse( &mut self, html: &str, page_url: Option<&Url>, ) -> Result<Article, Error>

pub fn check_html(&self, html: &str) -> bool

impl Parser

pub fn new() -> Self

pub fn with_max_elems_to_parse(self, n: usize) -> Self

pub fn with_n_top_candidates(self, n: usize) -> Self

pub fn with_char_threshold(self, n: usize) -> Self

pub fn with_classes_to_preserve( self, classes: impl IntoIterator<Item = impl Into<String>>, ) -> Self

pub fn with_keep_classes(self, keep: bool) -> Self

pub fn with_tags_to_score( self, tags: impl IntoIterator<Item = impl Into<String>>, ) -> Self

pub fn with_disable_jsonld(self, disable: bool) -> Self

pub fn with_allowed_video_regex(self, re: Regex) -> Self

Trait Implementations§

impl Default for Parser

fn default() -> Self

Auto Trait Implementations§

impl Freeze for Parser

impl !RefUnwindSafe for Parser

impl !Send for Parser

impl !Sync for Parser

impl Unpin for Parser

impl UnsafeUnpin for Parser

impl UnwindSafe for Parser

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Parser

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,