Function extract

Source

pub fn extract(
    html_str: &str,
    doc_uri: &str,
    options: ExtractOptions,
) -> Product

Expand description

Extract the main article content from an HTML page.

This is the primary entry-point of the crate. It implements the Readability algorithm: scoring candidate nodes by content density, pruning navigation / boilerplate, and returning the best content subtree along with any metadata (title, byline, etc.) that could be extracted.

§Arguments

html_str – the raw HTML source of the page.
doc_uri – the URL the page was fetched from. Used to resolve relative URLs in <a href>, <img src>, srcset, etc.
options – tuning knobs for the extraction algorithm. ExtractOptions::default() is a sensible starting point.

§Returns

A Product whose content field is Some if article content was found, or None if the page did not contain extractable content.

§Examples

use readable_rs::{extract, ExtractOptions};

let html = "<html><body><p>Short.</p></body></html>";
let product = extract(html, "https://example.com", ExtractOptions::default());
// product.content may be None — the paragraph is below the default char_threshold.

extract

Function extract Copy item path

§Arguments

§Returns

§Examples

Function extract