pub struct ExtractOptions {
pub debug: bool,
pub remove_style_tags: bool,
pub ready_for_epub: bool,
pub strip_unlikelys: bool,
pub weight_classes: bool,
pub keep_classes: bool,
pub classes_to_preserve: HashSet<String>,
pub clean_conditionally: bool,
pub max_elements_to_parse: u16,
pub n_top_candidates: u16,
pub char_threshold: u16,
pub link_density_modifier: f64,
}Expand description
Knobs that control the behaviour of the extraction algorithm.
All fields have sensible defaults via Default; start there and only
override what you need.
§Examples
use readable_rs::ExtractOptions;
let mut opts = ExtractOptions::default();
opts.char_threshold = 200; // accept shorter articles
opts.remove_style_tags = false; // keep <style> elementsFields§
§debug: boolEnable extra eprintln! tracing inside the algorithm (gated behind
debug_assertions in release builds).
Strip all <style> elements from the document before extraction.
ready_for_epub: boolWhen true, apply additional cleanup passes that produce output
suitable for embedding in an EPUB (e.g. stricter image handling).
strip_unlikelys: boolRemove elements whose class / id / role strongly suggest they are navigation, ads, or other non-content. Disabling this is one of the retry strategies when the first pass yields too little text.
weight_classes: boolUse class-name / id heuristics (positive/negative word lists) to adjust candidate scores. Disabling is another retry strategy.
keep_classes: boolWhen true, preserve CSS class attributes on the output nodes
(subject to classes_to_preserve).
When false, all class attributes are stripped.
classes_to_preserve: HashSet<String>The set of class names that are always kept even when
keep_classes is false. Readability’s own
marker classes (e.g. "page") are added automatically.
clean_conditionally: boolApply the “clean conditionally” pass, which removes elements with low content density (few commas, high link density, etc.). Disabling is the third retry strategy.
max_elements_to_parse: u16Maximum number of scoring candidates to evaluate. 0 means no limit.
n_top_candidates: u16How many top-scoring candidate nodes to retain before picking the winner. Higher values make the algorithm slightly more robust against mis-scored nodes.
char_threshold: u16Minimum character count the extracted content must reach before it is accepted. If the first pass falls short, the algorithm retries with progressively relaxed options.
link_density_modifier: f64An additive modifier applied to the link-density thresholds used in the “clean conditionally” pass. Positive values make the filter more lenient (tolerate higher link density).
Trait Implementations§
Source§impl Clone for ExtractOptions
impl Clone for ExtractOptions
Source§fn clone(&self) -> ExtractOptions
fn clone(&self) -> ExtractOptions
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more