Skip to main content

Module graph

Module graph 

Source
Expand description

Unified structured data graph combining all extraction formats.

This module provides the primary entry point extract_all which runs all three extractors (JSON-LD, Microdata, RDFa Lite) against an HTML document and merges the results into a single StructuredDataGraph.

§Pipeline

  1. Parse the HTML once using scraper::Html
  2. Run each extractor against the parsed DOM
  3. Merge all nodes and warnings into a single graph
  4. Individual extractor failures are captured as warnings (not errors)

§Examples

use schemaorg_rs::extract_all;

let html = r#"<html><head>
<script type="application/ld+json">{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Widget"
}</script>
</head></html>"#;

let graph = extract_all(html).unwrap();
assert_eq!(graph.nodes.len(), 1);
assert_eq!(graph.nodes[0].types, vec!["Product"]);
assert!(graph.warnings.is_empty());

Structs§

StructuredDataGraph
A unified graph of all structured data extracted from an HTML document.

Functions§

extract_all
Extracts all structured data from an HTML document.