Skip to main content

Crate readable_rs

Crate readable_rs 

Source
Expand description

A Rust port of Mozilla’s Readability algorithm for extracting the main article content from an HTML page.

§Quick start

use readable_rs::{extract, ExtractOptions};

let html = "<html><body><article><p>The actual article text goes here.</p></article></body></html>";
let product = extract(html, "https://example.com/article", ExtractOptions::default());

// product.content holds the extracted DOM (or None if nothing was found)
// product.title, product.by_line, product.sitename, etc. hold metadata

§Module layout

Modules§

parser
Thin wrappers around the underlying HTML parser.
shared_utils
Convenience re-exports of DOM helpers for post-processing extracted content.

Structs§

ExtractOptions
Knobs that control the behaviour of the extraction algorithm.
NodeScoreStore
An external store that maps DOM nodes to readability metadata without mutating the nodes themselves.
Product
The output of crate::extract. Contains the extracted article content as a DOM subtree together with any metadata that was found.

Traits§

NodeExt
DOM-navigation and element-manipulation helpers implemented on NodeRef.

Functions§

extract
Extract the main article content from an HTML page.
new_html_element
Create a new, detached HTML element node with the given tag name and no attributes or children.