pub trait Extractor: Send + Sync {
// Required method
fn extract(&self, html: &str) -> Result<ExtractionOutput, ExtractionError>;
}Expand description
Trait implemented by each extraction format (JSON-LD, Microdata, RDFa).
Provides a unified interface for extracting structured data from raw HTML.
Each implementation parses the HTML internally using scraper.
For better performance when running multiple extractors, use the
format-specific extract_from_document() methods which accept a
pre-parsed scraper::Html document.
§Examples
use schemaorg_rs::extraction::{Extractor, MicrodataExtractor};
let html = r#"<html><body>
<div itemscope itemtype="https://schema.org/Product">
<span itemprop="name">Widget</span>
</div>
</body></html>"#;
let output = MicrodataExtractor.extract(html).unwrap();
assert_eq!(output.nodes[0].types, vec!["Product"]);Required Methods§
Sourcefn extract(&self, html: &str) -> Result<ExtractionOutput, ExtractionError>
fn extract(&self, html: &str) -> Result<ExtractionOutput, ExtractionError>
Extracts structured data nodes from an HTML document.
§Errors
Returns ExtractionError if a fatal error prevents extraction.
Most issues are captured as warnings in the returned
ExtractionOutput instead.