Expand description
Parse structured data from raw HTML without DOM rendering.
This is the core of the no-browser acquisition engine. Extracts JSON-LD,
OpenGraph, meta tags, links, headings, and forms from raw HTML using
the scraper crate for CSS selector-based parsing.
Structs§
- Breadcrumb
Item - A breadcrumb item from JSON-LD BreadcrumbList.
- Extracted
Form - A form extracted from HTML.
- Extracted
Link - A link extracted from HTML.
- Form
Field - A field within a form.
- Json
LdArticle - Article data extracted from JSON-LD.
- Json
LdProduct - Product data extracted from JSON-LD.
- Meta
Tags - Standard meta tags.
- Open
Graph Data - OpenGraph metadata.
- Structured
Data - All structured data extracted from a single HTML page.
Functions§
- data_
completeness - Compute a data completeness score for structured data (0.0 to 1.0).
- extract_
links_ from_ html - Extract links from raw HTML as a simple list of internal URLs.
- extract_
structured_ data - Extract all structured data from raw HTML.
- jsonld_
type_ to_ page_ type - Map JSON-LD @type to PageType with confidence.