pub struct ReadabilityOptions {
pub max_elems_to_parse: Option<usize>,
pub nb_top_candidates: Option<usize>,
pub char_threshold: Option<usize>,
pub classes_to_preserve: Option<Vec<String>>,
pub keep_classes: Option<bool>,
pub disable_jsonld: Option<bool>,
pub link_density_modifier: Option<f32>,
}
Expand description
Configuration options for content extraction.
Created with ReadabilityOptions::new
and used with
Readability::parse_with_options
.
See also: Readability::parse
for basic extraction without options.
§Examples
use readability_js::ReadabilityOptions;
// Fine-tuned for news sites
let opts = ReadabilityOptions::new()
.char_threshold(500) // Require more content
.nb_top_candidates(10) // Consider more candidates
.keep_classes(true) // Preserve CSS classes
.classes_to_preserve(vec!["highlight".into(), "code".into()]);
Fields§
§max_elems_to_parse: Option<usize>
§nb_top_candidates: Option<usize>
§char_threshold: Option<usize>
§classes_to_preserve: Option<Vec<String>>
§keep_classes: Option<bool>
§disable_jsonld: Option<bool>
§link_density_modifier: Option<f32>
Implementations§
Source§impl ReadabilityOptions
impl ReadabilityOptions
Sourcepub fn max_elems_to_parse(self, val: usize) -> Self
pub fn max_elems_to_parse(self, val: usize) -> Self
Set maximum number of DOM elements to parse.
Limits processing to avoid performance issues on very large documents. Default is typically around 0 (unlimited).
§Arguments
val
- Maximum elements to process (0 = unlimited)
Sourcepub fn nb_top_candidates(self, val: usize) -> Self
pub fn nb_top_candidates(self, val: usize) -> Self
Set number of top content candidates to consider.
The algorithm identifies potential content containers and ranks them. Higher values may improve accuracy but reduce performance. Default is typically 5.
§Arguments
val
- Number of candidates to consider (recommended: 5-15)
Sourcepub fn char_threshold(self, val: usize) -> Self
pub fn char_threshold(self, val: usize) -> Self
Set minimum character threshold for readable content.
Content with fewer characters will fail the readability check. Lower values are more permissive but may include navigation/ads. Default is typically 140 characters.
§Arguments
val
- Minimum character count (recommended: 50-500)
Sourcepub fn classes_to_preserve(self, val: Vec<String>) -> Self
pub fn classes_to_preserve(self, val: Vec<String>) -> Self
Specify CSS classes to preserve in the output.
By default, most CSS classes are stripped from the cleaned HTML. Use this to preserve important styling classes.
§Arguments
val
- Vector of class names to preserve (e.g.,vec!["highlight".into()]
)
Sourcepub fn keep_classes(self, val: bool) -> Self
pub fn keep_classes(self, val: bool) -> Self
Whether to preserve CSS classes in the output.
When true, CSS classes are preserved in the cleaned HTML. When false (default), most classes are stripped.
§Arguments
val
- true to preserve classes, false to strip them
Sourcepub fn disable_jsonld(self, val: bool) -> Self
pub fn disable_jsonld(self, val: bool) -> Self
Disable JSON-LD metadata extraction.
JSON-LD structured data can provide additional article metadata (author, publish date, etc.). Disable this if you don’t need metadata or if it causes issues.
§Arguments
val
- true to disable JSON-LD parsing, false to enable it
Sourcepub fn link_density_modifier(self, val: f32) -> Self
pub fn link_density_modifier(self, val: f32) -> Self
Modify the link density calculation.
Content with high link density is often navigation rather than article content. This modifier adjusts how strictly link density is evaluated. Values > 1.0 are more permissive, < 1.0 are stricter.
§Arguments
val
- Link density modifier (recommended: 0.5-2.0, default: 1.0)
Trait Implementations§
Source§impl Clone for ReadabilityOptions
impl Clone for ReadabilityOptions
Source§fn clone(&self) -> ReadabilityOptions
fn clone(&self) -> ReadabilityOptions
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more