Readability

Struct Readability 

Source
pub struct Readability { /* private fields */ }
Expand description

The main readability parser that extracts clean content from HTML.

Uses Mozilla’s Readability.js algorithm running in an embedded JavaScript engine. Create once and reuse for multiple extractions - the JS context initialization is expensive.

§Examples

use readability_js::{Readability, ReadabilityOptions};

// Create parser (expensive - reuse this!)
let reader = Readability::new()?;

// Basic extraction
let article = reader.extract(html, Some("https://example.com"), None)?;

// With custom options
let options = ReadabilityOptions::new()
    .char_threshold(500);
let article = reader.extract(html, Some("https://example.com"), Some(options))?;

§Thread Safety

Readability instances are not thread-safe (!Send + !Sync). Each instance contains an embedded JavaScript engine that cannot be moved between threads or shared between threads.

Implementations§

Source§

impl Readability

Source

pub fn new() -> Result<Self, ReadabilityError>

Creates a new readability parser.

§Performance

This operation is expensive (50-100ms) as it initializes a JavaScript engine and loads the Readability.js library. Create one instance and reuse it for multiple extractions.

§JavaScript Engine

This method initializes an embedded QuickJS runtime. The JavaScript code executed is Mozilla’s Readability.js library and is considered safe for processing untrusted HTML input.

Source

pub fn parse(&self, html: &str) -> Result<Article, ReadabilityError>

Extract readable content from HTML.

This is the main extraction method. It processes the HTML to remove ads, navigation, sidebars and other clutter, leaving just the main article content.

§Arguments
  • html - The HTML content to process. Should be a complete HTML document.
§Examples
use readability_js::Readability;

let html = r#"
  <html>
    <body>
      <article>
        <h1>Breaking News</h1>
        <p>Important news content here...</p>
      </article>
      <nav>Navigation menu</nav>
      <aside>Advertisement</aside>
    </body>
  </html>
"#;

let reader = Readability::new()?;
let article = reader.parse(html)?;

assert_eq!(article.title, "Breaking News");
assert!(article.content.contains("Important news content"));
// Navigation and ads are removed from the output
§Errors

Returns ReadabilityError if:

  • The HTML is malformed or empty (HtmlParseError)
  • The page fails readability checks (ReadabilityCheckFailed)
  • JavaScript evaluation fails (JsEvaluation)
§Performance

This method is fast (typically <10ms) once the Readability instance is created. The expensive operation is Readability::new() which should be called once and reused.

Source

pub fn parse_with_url( &self, html: &str, base_url: &str, ) -> Result<Article, ReadabilityError>

Extract readable content from HTML with URL context.

The URL helps with better link resolution and metadata extraction.

§Arguments
  • html - The HTML content to extract from
  • base_url - The original URL of the page for link resolution
§Examples
use readability_js::Readability;

let reader = Readability::new()?;
let article = reader.parse_with_url(html, "https://example.com/article")?;
// Links in the article will be properly resolved
§Errors

This function will return an error if:

Source

pub fn parse_with_options( &self, html: &str, base_url: Option<&str>, options: Option<ReadabilityOptions>, ) -> Result<Article, ReadabilityError>

Extract readable content with custom parsing options.

§Arguments
  • html - The HTML content to extract from
  • base_url - Optional URL for link resolution
  • options - Custom parsing options
§Examples
use readability_js::{Readability, ReadabilityOptions};

let options = ReadabilityOptions::new()
    .char_threshold(500);

let reader = Readability::new()?;
let article = reader.parse_with_options(html, Some("https://example.com"), Some(options))?;
§Errors

This function will return an error if:

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> ParallelSend for T