pub fn extract(
html_str: &str,
doc_uri: &str,
options: ExtractOptions,
) -> ProductExpand description
Extract the main article content from an HTML page.
This is the primary entry-point of the crate. It implements the Readability algorithm: scoring candidate nodes by content density, pruning navigation / boilerplate, and returning the best content subtree along with any metadata (title, byline, etc.) that could be extracted.
§Arguments
html_str– the raw HTML source of the page.doc_uri– the URL the page was fetched from. Used to resolve relative URLs in<a href>,<img src>,srcset, etc.options– tuning knobs for the extraction algorithm.ExtractOptions::default()is a sensible starting point.
§Returns
A Product whose content field is Some if article content was found,
or None if the page did not contain extractable content.
§Examples
use readable_rs::{extract, ExtractOptions};
let html = "<html><body><p>Short.</p></body></html>";
let product = extract(html, "https://example.com", ExtractOptions::default());
// product.content may be None — the paragraph is below the default char_threshold.