Crate libreadability

Expand description

Readability article extraction library.

libreadability extracts the main article content from web pages by analyzing DOM structure, scoring content density, and removing boilerplate. It is a Rust port of readability by readeck, itself a Go port of Mozilla’s Readability.js.

§Quick start

use libreadability::Parser;

let html = r#"<html><body>
  <nav>Navigation links</nav>
  <article><p>This is the main article body with enough text to be extracted.</p>
  <p>The readability algorithm scores content density and identifies the
  primary article content, stripping navigation, ads, and other boilerplate.</p></article>
  <aside>Sidebar content</aside>
</body></html>"#;

let mut parser = Parser::new();
let article = parser.parse(html, None).expect("valid HTML");
assert!(!article.content.is_empty());
assert!(!article.text_content.is_empty());

§Output

Article contains both cleaned HTML (content) and plain text (text_content), plus metadata like title, byline, excerpt, published time, and text direction.

Structs§

Article: The extracted article content and metadata.
Parser: Port of Parser — the core readability extraction engine.

Enums§

Error

Crate libreadability

Crate libreadability Copy item path

§Quick start

§Output

Structs§

Enums§

Crate libreadability