schemaorg-validate 0.3.0

Parse and validate Schema.org structured data (JSON-LD, Microdata, RDFa) against the official vocabulary and Google Rich Results profiles.
Documentation

schemaorg-rs

Crates.io docs.rs CI License: MIT npm

The first open-source, offline, embeddable Schema.org structured data validator.

Parse JSON-LD, Microdata, and RDFa into a unified data model. Validate against the official Schema.org vocabulary (v30.0). Check Google Rich Results eligibility. All offline, all embeddable, all from a single Rust library.


Current Status

Component Status Details
Extraction Engine ✅ Stable JSON-LD, Microdata, RDFa Lite — 3 formats, unified output
Vocabulary Validation ✅ Stable 800+ types, 1400+ properties, Schema.org v30.0
Rich Results Profiles ✅ Stable 7 Google profiles + baseline
CLI ✅ Stable schemaorg-validate — text, JSON, SARIF output
WASM / npm ✅ Stable @schemaorg-rs/wasm
Test Suite ✅ 290 tests All passing, all features

Published on crates.io and npm.


Quick Start

As a Library

use schemaorg_rs::{extract_all, validation};
use schemaorg_rs::profiles::{ProfileRegistry, Eligibility};

let html = r#"<script type="application/ld+json">{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Widget",
  "offers": { "@type": "Offer", "price": "29.99", "priceCurrency": "EUR" }
}</script>"#;

// Extract -> Validate -> Profile
let graph = extract_all(html).unwrap();
let result = validation::validate(&graph);
let registry = ProfileRegistry::with_google();
let profile = registry.evaluate("google", &graph, &result.diagnostics).unwrap();

match profile.eligibility {
    Eligibility::Eligible => println!("Rich result eligible!"),
    Eligibility::WarningsOnly => println!("Eligible with warnings"),
    Eligibility::NotEligible => println!("Not eligible"),
    Eligibility::Restricted => println!("Restricted"),
}

As a CLI Tool

# Install
cargo install schemaorg-validate

# Validate a file
schemaorg-validate --file page.html --profile google

# Validate a URL
schemaorg-validate --url https://example.com --profile google

# JSON output for CI
schemaorg-validate --file page.html --format json

# SARIF for GitHub Code Scanning
schemaorg-validate --file page.html --format sarif > results.sarif

As an npm Package (WASM)

npm install @schemaorg-rs/wasm
import { validateHtml } from '@schemaorg-rs/wasm';
const result = JSON.parse(validateHtml(htmlString));

Features

Feature Description
Extraction JSON-LD, Microdata, RDFa Lite into unified SchemaNode model
Validation Type/property/value checking against Schema.org v30.0
Profiles Google Rich Results eligibility for 7 schema types
CLI schemaorg-validate with text, JSON, and SARIF output
WASM Browser/Node.js via WebAssembly
Offline Vocabulary vendored at compile time, zero network calls

Extraction

  • JSON-LD@graph arrays, @id cross-references, nested objects, source locations
  • Microdataitemscope/itemprop, itemref, value extraction by element type
  • RDFa Litevocab/typeof/property, prefix namespaces, resource identifiers

Validation

  • Unknown/deprecated/pending types and properties
  • Property domain checking (wrong type for property)
  • Value type mismatches (Number where URL expected)
  • Enum validation, boolean/number coercion warnings
  • "Did you mean?" suggestions via Levenshtein distance

Google Rich Results Profiles

Type Subtypes
Product
Article NewsArticle, BlogPosting
FAQPage — (restricted since 2024)
BreadcrumbList
LocalBusiness Restaurant, Store, all subtypes
Event
Recipe

CLI Output Formats

  • Text — colored, human-readable with eligibility summary
  • JSON — structured output for programmatic consumption
  • SARIF 2.1.0 — GitHub Code Scanning compatible

Installation

Library

[dependencies]
schemaorg-validate = "0.3"

CLI

cargo install schemaorg-validate

Feature Flags

Flag Default Enables
extraction Yes HTML parsing, all 3 extractors
validation No Schema.org vocabulary validation
profiles No Rich Results profiles
wasm No WASM bindings
cli No CLI binary (schemaorg-validate)
full No extraction + validation + profiles
# Full library (no CLI/WASM)
schemaorg-validate = { version = "0.3", features = ["full"] }

# Core types only (no HTML parsing)
schemaorg-validate = { version = "0.3", default-features = false }

Usage

Extract all formats

use schemaorg_rs::{extract_all, SourceFormat};

let graph = extract_all(html)?;
for node in &graph.nodes {
    println!("{:?}: {:?}", node.source_format, node.types);
}

Validate against vocabulary

use schemaorg_rs::{extract_all, validation};

let graph = extract_all(html)?;
let result = validation::validate(&graph);

for diag in &result.diagnostics {
    println!("[{}] {} -- {}", diag.severity, diag.path, diag.message);
}

Check Rich Results eligibility

use schemaorg_rs::profiles::{ProfileRegistry, Eligibility};

let registry = ProfileRegistry::with_google();
let result = registry.evaluate("google", &graph, &diagnostics)?;

for tr in &result.type_results {
    println!("{}: eligible={}, missing={:?}",
        tr.schema_type, tr.eligible, tr.required_missing);
}

GitHub Action

- uses: mitrovicsinisaa/schemaorg-rs/.github/actions/schemaorg-validate@main
  with:
    files: 'dist/**/*.html'
    profile: google
    upload-sarif: 'true'

See Action README for full options.


Architecture

HTML input
  │
  ├─ JSON-LD extractor ──────┐
  ├─ Microdata extractor ────┤──▶ StructuredDataGraph
  └─ RDFa Lite extractor ────┘         │
                                        ├──▶ Vocabulary Validator (Schema.org v30.0)
                                        │         │
                                        │         ▼
                                        │    ValidationResult (diagnostics)
                                        │         │
                                        └──▶ Profile Engine ──▶ ProfileResult (eligibility)
                                                  │
                                              ┌───┴────┐
                                          Google    Baseline
                                        (7 types)  (generic)

The Schema.org vocabulary (800+ types, 1400+ properties) is resolved entirely at compile time via build.rs codegen. Runtime validation uses static match trees — zero heap allocation, zero parsing, ideal for WASM.


Documentation


Why This Exists

Schema.org structured data is embedded in hundreds of millions of web pages. When it's broken — a missing name on a Product, a wrong value type on offers.price — search engines silently ignore it. No rich results. No AI citations. No visibility.

The only validators that understand Schema.org semantically are closed-source, hosted by Google, and require sending your URLs to their servers.

schemaorg-rs is the first open-source, offline, embeddable Schema.org validator. It runs in Rust, WASM, and CLI. It validates vocabulary correctness and Rich Results eligibility in one pass.


Roadmap

The core library is stable and shipping. Future work focuses on ecosystem integration and expanded coverage:

  • CMS Integrations — Shopware 6 plugin, TYPO3 extension, WordPress plugin
  • Language Bindings — Python (PyO3), PHP extension, native Node.js (napi)
  • Additional Profiles — VideoObject, JobPosting, HowTo, Course, Review, Dataset
  • Hosted API — Self-hostable HTTP API + Docker image (open-source Rich Results Test alternative)
  • Auto-fix Engine — Not just "this is broken" but "here's the corrected JSON-LD"
  • Schema.org W3C Engagementschema:pending support, upstream test fixtures

See CHANGELOG.md for version history.


License

MIT — see LICENSE