Skip to main content

Extractor

Trait Extractor 

Source
pub trait Extractor: Send + Sync {
    // Required method
    fn extract(&self, html: &str) -> Result<ExtractionOutput, ExtractionError>;
}
Expand description

Trait implemented by each extraction format (JSON-LD, Microdata, RDFa).

Provides a unified interface for extracting structured data from raw HTML. Each implementation parses the HTML internally using scraper.

For better performance when running multiple extractors, use the format-specific extract_from_document() methods which accept a pre-parsed scraper::Html document.

§Examples

use schemaorg_rs::extraction::{Extractor, MicrodataExtractor};

let html = r#"<html><body>
<div itemscope itemtype="https://schema.org/Product">
<span itemprop="name">Widget</span>
</div>
</body></html>"#;

let output = MicrodataExtractor.extract(html).unwrap();
assert_eq!(output.nodes[0].types, vec!["Product"]);

Required Methods§

Source

fn extract(&self, html: &str) -> Result<ExtractionOutput, ExtractionError>

Extracts structured data nodes from an HTML document.

§Errors

Returns ExtractionError if a fatal error prevents extraction. Most issues are captured as warnings in the returned ExtractionOutput instead.

Implementors§