schemaorg-rs
The first open-source, offline, embeddable Schema.org structured data validator.
Parse JSON-LD, Microdata, and RDFa into a unified data model. Validate against the official Schema.org vocabulary (v30.0). Check Google Rich Results eligibility. All offline, all embeddable, all from a single Rust library.
Current Status
| Component | Status | Details |
|---|---|---|
| Extraction Engine | ✅ Stable | JSON-LD, Microdata, RDFa Lite — 3 formats, unified output |
| Vocabulary Validation | ✅ Stable | 800+ types, 1400+ properties, Schema.org v30.0 |
| Rich Results Profiles | ✅ Stable | 7 Google profiles + baseline |
| CLI | ✅ Stable | schemaorg-validate — text, JSON, SARIF output |
| WASM / npm | ✅ Stable | @schemaorg-rs/wasm |
| Test Suite | ✅ 290 tests | All passing, all features |
Published on crates.io and npm.
Quick Start
As a Library
use ;
use ;
let html = r#"<script type="application/ld+json">{
"@context": "https://schema.org",
"@type": "Product",
"name": "Widget",
"offers": { "@type": "Offer", "price": "29.99", "priceCurrency": "EUR" }
}</script>"#;
// Extract -> Validate -> Profile
let graph = extract_all.unwrap;
let result = validate;
let registry = with_google;
let profile = registry.evaluate.unwrap;
match profile.eligibility
As a CLI Tool
# Install
# Validate a file
# Validate a URL
# JSON output for CI
# SARIF for GitHub Code Scanning
As an npm Package (WASM)
import from '@schemaorg-rs/wasm';
const result = JSON.;
Features
| Feature | Description |
|---|---|
| Extraction | JSON-LD, Microdata, RDFa Lite into unified SchemaNode model |
| Validation | Type/property/value checking against Schema.org v30.0 |
| Profiles | Google Rich Results eligibility for 7 schema types |
| CLI | schemaorg-validate with text, JSON, and SARIF output |
| WASM | Browser/Node.js via WebAssembly |
| Offline | Vocabulary vendored at compile time, zero network calls |
Extraction
- JSON-LD —
@grapharrays,@idcross-references, nested objects, source locations - Microdata —
itemscope/itemprop,itemref, value extraction by element type - RDFa Lite —
vocab/typeof/property,prefixnamespaces,resourceidentifiers
Validation
- Unknown/deprecated/pending types and properties
- Property domain checking (wrong type for property)
- Value type mismatches (Number where URL expected)
- Enum validation, boolean/number coercion warnings
- "Did you mean?" suggestions via Levenshtein distance
Google Rich Results Profiles
| Type | Subtypes |
|---|---|
| Product | — |
| Article | NewsArticle, BlogPosting |
| FAQPage | — (restricted since 2024) |
| BreadcrumbList | — |
| LocalBusiness | Restaurant, Store, all subtypes |
| Event | — |
| Recipe | — |
CLI Output Formats
- Text — colored, human-readable with eligibility summary
- JSON — structured output for programmatic consumption
- SARIF 2.1.0 — GitHub Code Scanning compatible
Installation
Library
[]
= "0.3"
CLI
Feature Flags
| Flag | Default | Enables |
|---|---|---|
extraction |
Yes | HTML parsing, all 3 extractors |
validation |
No | Schema.org vocabulary validation |
profiles |
No | Rich Results profiles |
wasm |
No | WASM bindings |
cli |
No | CLI binary (schemaorg-validate) |
full |
No | extraction + validation + profiles |
# Full library (no CLI/WASM)
= { = "0.3", = ["full"] }
# Core types only (no HTML parsing)
= { = "0.3", = false }
Usage
Extract all formats
use ;
let graph = extract_all?;
for node in &graph.nodes
Validate against vocabulary
use ;
let graph = extract_all?;
let result = validate;
for diag in &result.diagnostics
Check Rich Results eligibility
use ;
let registry = with_google;
let result = registry.evaluate?;
for tr in &result.type_results
GitHub Action
- uses: mitrovicsinisaa/schemaorg-rs/.github/actions/schemaorg-validate@main
with:
files: 'dist/**/*.html'
profile: google
upload-sarif: 'true'
See Action README for full options.
Architecture
HTML input
│
├─ JSON-LD extractor ──────┐
├─ Microdata extractor ────┤──▶ StructuredDataGraph
└─ RDFa Lite extractor ────┘ │
├──▶ Vocabulary Validator (Schema.org v30.0)
│ │
│ ▼
│ ValidationResult (diagnostics)
│ │
└──▶ Profile Engine ──▶ ProfileResult (eligibility)
│
┌───┴────┐
Google Baseline
(7 types) (generic)
The Schema.org vocabulary (800+ types, 1400+ properties) is resolved entirely at
compile time via build.rs codegen. Runtime validation uses static match
trees — zero heap allocation, zero parsing, ideal for WASM.
Documentation
- User Guide — getting started, all features
- CLI Reference — all options, output formats, SARIF rule IDs
- Architecture — internal design, data flow, codegen
- Profile Docs — required/recommended fields per type
- API Reference — full Rust docs
- Contributing — dev setup, code standards, adding profiles
- Changelog — version history
Why This Exists
Schema.org structured data is embedded in hundreds of millions of web pages.
When it's broken — a missing name on a Product, a wrong value type on
offers.price — search engines silently ignore it. No rich results. No AI
citations. No visibility.
The only validators that understand Schema.org semantically are closed-source, hosted by Google, and require sending your URLs to their servers.
schemaorg-rs is the first open-source, offline, embeddable Schema.org
validator. It runs in Rust, WASM, and CLI. It validates vocabulary correctness
and Rich Results eligibility in one pass.
Roadmap
The core library is stable and shipping. Future work focuses on ecosystem integration and expanded coverage:
- CMS Integrations — Shopware 6 plugin, TYPO3 extension, WordPress plugin
- Language Bindings — Python (PyO3), PHP extension, native Node.js (napi)
- Additional Profiles — VideoObject, JobPosting, HowTo, Course, Review, Dataset
- Hosted API — Self-hostable HTTP API + Docker image (open-source Rich Results Test alternative)
- Auto-fix Engine — Not just "this is broken" but "here's the corrected JSON-LD"
- Schema.org W3C Engagement —
schema:pendingsupport, upstream test fixtures
See CHANGELOG.md for version history.
License
MIT — see LICENSE