schemaorg-rs
Extract, validate, and profile Schema.org structured data from HTML.
Parse JSON-LD, Microdata, and RDFa into a unified data model. Validate against the official Schema.org vocabulary. Check Google Rich Results eligibility. All offline, all embeddable, all from a single Rust library.
Quick Start
As a Library
use ;
use ;
let html = r#"<script type="application/ld+json">{
"@context": "https://schema.org",
"@type": "Product",
"name": "Widget",
"offers": { "@type": "Offer", "price": "29.99", "priceCurrency": "EUR" }
}</script>"#;
// Extract -> Validate -> Profile
let graph = extract_all.unwrap;
let result = validate;
let registry = with_google;
let profile = registry.evaluate.unwrap;
match profile.eligibility
As a CLI Tool
# Install
# Validate a file
# Validate a URL
# JSON output for CI
# SARIF for GitHub Code Scanning
As an npm Package (WASM)
import from '@schemaorg-rs/wasm';
const result = JSON.;
Features
| Feature | Description |
|---|---|
| Extraction | JSON-LD, Microdata, RDFa Lite into unified SchemaNode model |
| Validation | Type/property/value checking against Schema.org v30.0 |
| Profiles | Google Rich Results eligibility for 7 schema types |
| CLI | schemaorg-validate with text, JSON, and SARIF output |
| WASM | Browser/Node.js via WebAssembly |
| Offline | Vocabulary vendored at compile time, zero network calls |
Extraction
- JSON-LD --
@grapharrays,@idcross-references, nested objects, source locations - Microdata --
itemscope/itemprop,itemref, value extraction by element type - RDFa Lite --
vocab/typeof/property,prefixnamespaces,resourceidentifiers
Validation
- Unknown/deprecated/pending types and properties
- Property domain checking (wrong type for property)
- Value type mismatches (Number where URL expected)
- Enum validation, boolean/number coercion warnings
- "Did you mean?" suggestions via Levenshtein distance
Google Rich Results Profiles
| Type | Subtypes |
|---|---|
| Product | -- |
| Article | NewsArticle, BlogPosting |
| FAQPage | -- (restricted since 2024) |
| BreadcrumbList | -- |
| LocalBusiness | Restaurant, Store, all subtypes |
| Event | -- |
| Recipe | -- |
CLI Output Formats
- Text -- colored, human-readable with eligibility summary
- JSON -- structured output for programmatic consumption
- SARIF 2.1.0 -- GitHub Code Scanning compatible
Installation
Library
[]
= "0.1"
CLI
Feature Flags
| Flag | Default | Enables |
|---|---|---|
extraction |
Yes | HTML parsing, all 3 extractors |
validation |
No | Schema.org vocabulary validation |
profiles |
No | Rich Results profiles |
wasm |
No | WASM bindings |
cli |
No | CLI binary (schemaorg-validate) |
full |
No | extraction + validation + profiles |
# Full library (no CLI/WASM)
= { = "0.1", = ["full"] }
# Core types only (no HTML parsing)
= { = "0.1", = false }
Usage
Extract all formats
use ;
let graph = extract_all?;
for node in &graph.nodes
Validate against vocabulary
use ;
let graph = extract_all?;
let result = validate;
for diag in &result.diagnostics
Check Rich Results eligibility
use ;
let registry = with_google;
let result = registry.evaluate?;
for tr in &result.type_results
GitHub Action
- uses: mitrovicsinisaa/schemaorg-rs/.github/actions/schemaorg-validate@main
with:
files: 'dist/**/*.html'
profile: google
upload-sarif: 'true'
See Action README for full options.
Documentation
- User Guide -- getting started, all features
- CLI Reference -- all options, output formats, SARIF rule IDs
- Architecture -- internal design, data flow, codegen
- Profile Docs -- required/recommended fields per type
- API Reference -- full Rust docs
- Contributing -- dev setup, code standards, adding profiles
- Changelog -- version history
Why This Exists
Schema.org structured data is embedded in hundreds of millions of web pages.
When it's broken -- a missing name on a Product, a wrong value type on
offers.price -- search engines silently ignore it. No rich results. No AI
citations. No visibility.
The only validators that understand Schema.org semantically are closed-source, hosted by Google, and require sending your URLs to their servers.
schemaorg-rs is the first open-source, offline, embeddable Schema.org
validator. It runs in Rust, WASM, and CLI. It validates vocabulary correctness
and Rich Results eligibility in one pass.
License
MIT -- see LICENSE