Skip to main content

Extractor

Trait Extractor 

Source
pub trait Extractor: Send + Sync {
    // Required method
    fn extract(&self, html: &str) -> Result<ExtractionOutput, ExtractionError>;
}
Expand description

Trait implemented by each extraction format (JSON-LD, Microdata, RDFa).

Provides a unified interface for extracting structured data from raw HTML. Each implementation parses the HTML internally using scraper.

For better performance when running multiple extractors, use the format-specific extract_from_document() methods which accept a pre-parsed scraper::Html document.

§Examples

use schemaorg_rs::extraction::{Extractor, MicrodataExtractor};

let html = r#"<html><body>
<div itemscope itemtype="https://schema.org/Product">
<span itemprop="name">Widget</span>
</div>
</body></html>"#;

let output = MicrodataExtractor.extract(html).unwrap();
assert_eq!(output.nodes[0].types, vec!["Product"]);

Required Methods§

Source

fn extract(&self, html: &str) -> Result<ExtractionOutput, ExtractionError>

Extracts structured data nodes from an HTML document.

§Errors

Returns ExtractionError if a fatal error prevents extraction. Most issues are captured as warnings in the returned ExtractionOutput instead.

Dyn Compatibility§

This trait is dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementors§