Struct ExtractOptions

Source

pub struct ExtractOptions {
    pub debug: bool,
    pub remove_style_tags: bool,
    pub ready_for_epub: bool,
    pub strip_unlikelys: bool,
    pub weight_classes: bool,
    pub keep_classes: bool,
    pub classes_to_preserve: HashSet<String>,
    pub clean_conditionally: bool,
    pub max_elements_to_parse: u16,
    pub n_top_candidates: u16,
    pub char_threshold: u16,
    pub link_density_modifier: f64,
}

Expand description

Knobs that control the behaviour of the extraction algorithm.

All fields have sensible defaults via Default; start there and only override what you need.

§Examples

use readable_rs::ExtractOptions;

let mut opts = ExtractOptions::default();
opts.char_threshold = 200;   // accept shorter articles
opts.remove_style_tags = false; // keep <style> elements

Fields§

§debug: bool

Enable extra eprintln! tracing inside the algorithm (gated behind debug_assertions in release builds).

§remove_style_tags: bool

Strip all <style> elements from the document before extraction.

§ready_for_epub: bool

When true, apply additional cleanup passes that produce output suitable for embedding in an EPUB (e.g. stricter image handling).

§strip_unlikelys: bool

Remove elements whose class / id / role strongly suggest they are navigation, ads, or other non-content. Disabling this is one of the retry strategies when the first pass yields too little text.

§weight_classes: bool

Use class-name / id heuristics (positive/negative word lists) to adjust candidate scores. Disabling is another retry strategy.

§keep_classes: bool

When true, preserve CSS class attributes on the output nodes (subject to classes_to_preserve). When false, all class attributes are stripped.

§classes_to_preserve: HashSet<String>

The set of class names that are always kept even when keep_classes is false. Readability’s own marker classes (e.g. "page") are added automatically.

§clean_conditionally: bool

Apply the “clean conditionally” pass, which removes elements with low content density (few commas, high link density, etc.). Disabling is the third retry strategy.

§max_elements_to_parse: u16

Maximum number of scoring candidates to evaluate. 0 means no limit.

§n_top_candidates: u16

How many top-scoring candidate nodes to retain before picking the winner. Higher values make the algorithm slightly more robust against mis-scored nodes.

§char_threshold: u16

Minimum character count the extracted content must reach before it is accepted. If the first pass falls short, the algorithm retries with progressively relaxed options.

§link_density_modifier: f64

An additive modifier applied to the link-density thresholds used in the “clean conditionally” pass. Positive values make the filter more lenient (tolerate higher link density).

Struct ExtractOptions Copy item path

§Examples

Fields§

Trait Implementations§

impl Clone for ExtractOptions

fn clone(&self) -> ExtractOptions

fn clone_from(&mut self, source: &Self)

impl Debug for ExtractOptions

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for ExtractOptions

fn default() -> ExtractOptions

Auto Trait Implementations§

impl Freeze for ExtractOptions

impl RefUnwindSafe for ExtractOptions

impl Send for ExtractOptions

impl Sync for ExtractOptions

impl Unpin for ExtractOptions

impl UnwindSafe for ExtractOptions

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct ExtractOptions

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,