Skip to main content

Module structured

Module structured 

Source
Expand description

Parse structured data from raw HTML without DOM rendering.

This is the core of the no-browser acquisition engine. Extracts JSON-LD, OpenGraph, meta tags, links, headings, and forms from raw HTML using the scraper crate for CSS selector-based parsing.

Structs§

BreadcrumbItem
A breadcrumb item from JSON-LD BreadcrumbList.
ExtractedForm
A form extracted from HTML.
ExtractedLink
A link extracted from HTML.
FormField
A field within a form.
JsonLdArticle
Article data extracted from JSON-LD.
JsonLdProduct
Product data extracted from JSON-LD.
MetaTags
Standard meta tags.
OpenGraphData
OpenGraph metadata.
StructuredData
All structured data extracted from a single HTML page.

Functions§

data_completeness
Compute a data completeness score for structured data (0.0 to 1.0).
extract_links_from_html
Extract links from raw HTML as a simple list of internal URLs.
extract_structured_data
Extract all structured data from raw HTML.
jsonld_type_to_page_type
Map JSON-LD @type to PageType with confidence.