Crate readable_rs

Expand description

A Rust port of Mozilla’s Readability algorithm for extracting the main article content from an HTML page.

§Quick start

use readable_rs::{extract, ExtractOptions};

let html = "<html><body><article><p>The actual article text goes here.</p></article></body></html>";
let product = extract(html, "https://example.com/article", ExtractOptions::default());

// product.content holds the extracted DOM (or None if nothing was found)
// product.title, product.by_line, product.sitename, etc. hold metadata

§Module layout

Top level – extract is the single entry-point. Product and ExtractOptions are the main public types.
parser – thin wrappers around the underlying HTML parser (parser::NodeRef, parser::parse_html).
shared_utils – a curated set of DOM helpers useful when post-processing the extracted content (URL resolution, text normalisation, etc.).
NodeExt / NodeScoreStore – the trait and store that the scorer uses to attach readability metadata to DOM nodes without modifying the nodes themselves.

Modules§

parser: Thin wrappers around the underlying HTML parser.
shared_utils: Convenience re-exports of DOM helpers for post-processing extracted content.

Structs§

ExtractOptions: Knobs that control the behaviour of the extraction algorithm.
NodeScoreStore: An external store that maps DOM nodes to readability metadata without mutating the nodes themselves.
Product: The output of crate::extract. Contains the extracted article content as a DOM subtree together with any metadata that was found.

Traits§

NodeExt: DOM-navigation and element-manipulation helpers implemented on NodeRef.

Functions§

extract: Extract the main article content from an HTML page.
new_html_element: Create a new, detached HTML element node with the given tag name and no attributes or children.

Crate readable_rs

Crate readable_rs Copy item path

§Quick start

§Module layout

Modules§

Structs§

Traits§

Functions§

Crate readable_rs