schemaorg-validate 0.2.0

Parse and validate Schema.org structured data (JSON-LD, Microdata, RDFa) against the official vocabulary and Google Rich Results profiles.
Documentation

schemaorg-rs

Crates.io docs.rs CI License: MIT npm

Extract, validate, and profile Schema.org structured data from HTML.

Parse JSON-LD, Microdata, and RDFa into a unified data model. Validate against the official Schema.org vocabulary. Check Google Rich Results eligibility. All offline, all embeddable, all from a single Rust library.


Quick Start

As a Library

use schemaorg_rs::{extract_all, validation};
use schemaorg_rs::profiles::{ProfileRegistry, Eligibility};

let html = r#"<script type="application/ld+json">{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Widget",
  "offers": { "@type": "Offer", "price": "29.99", "priceCurrency": "EUR" }
}</script>"#;

// Extract -> Validate -> Profile
let graph = extract_all(html).unwrap();
let result = validation::validate(&graph);
let registry = ProfileRegistry::with_google();
let profile = registry.evaluate("google", &graph, &result.diagnostics).unwrap();

match profile.eligibility {
    Eligibility::Eligible => println!("Rich result eligible!"),
    Eligibility::WarningsOnly => println!("Eligible with warnings"),
    Eligibility::NotEligible => println!("Not eligible"),
    Eligibility::Restricted => println!("Restricted"),
}

As a CLI Tool

# Install
cargo install schemaorg-validate

# Validate a file
schemaorg-validate --file page.html --profile google

# Validate a URL
schemaorg-validate --url https://example.com --profile google

# JSON output for CI
schemaorg-validate --file page.html --format json

# SARIF for GitHub Code Scanning
schemaorg-validate --file page.html --format sarif > results.sarif

As an npm Package (WASM)

npm install @schemaorg-rs/wasm
import { validateHtml } from '@schemaorg-rs/wasm';
const result = JSON.parse(validateHtml(htmlString));

Features

Feature Description
Extraction JSON-LD, Microdata, RDFa Lite into unified SchemaNode model
Validation Type/property/value checking against Schema.org v30.0
Profiles Google Rich Results eligibility for 7 schema types
CLI schemaorg-validate with text, JSON, and SARIF output
WASM Browser/Node.js via WebAssembly
Offline Vocabulary vendored at compile time, zero network calls

Extraction

  • JSON-LD -- @graph arrays, @id cross-references, nested objects, source locations
  • Microdata -- itemscope/itemprop, itemref, value extraction by element type
  • RDFa Lite -- vocab/typeof/property, prefix namespaces, resource identifiers

Validation

  • Unknown/deprecated/pending types and properties
  • Property domain checking (wrong type for property)
  • Value type mismatches (Number where URL expected)
  • Enum validation, boolean/number coercion warnings
  • "Did you mean?" suggestions via Levenshtein distance

Google Rich Results Profiles

Type Subtypes
Product --
Article NewsArticle, BlogPosting
FAQPage -- (restricted since 2024)
BreadcrumbList --
LocalBusiness Restaurant, Store, all subtypes
Event --
Recipe --

CLI Output Formats

  • Text -- colored, human-readable with eligibility summary
  • JSON -- structured output for programmatic consumption
  • SARIF 2.1.0 -- GitHub Code Scanning compatible

Installation

Library

[dependencies]
schemaorg-rs = "0.1"

CLI

cargo install schemaorg-validate

Feature Flags

Flag Default Enables
extraction Yes HTML parsing, all 3 extractors
validation No Schema.org vocabulary validation
profiles No Rich Results profiles
wasm No WASM bindings
cli No CLI binary (schemaorg-validate)
full No extraction + validation + profiles
# Full library (no CLI/WASM)
schemaorg-rs = { version = "0.1", features = ["full"] }

# Core types only (no HTML parsing)
schemaorg-rs = { version = "0.1", default-features = false }

Usage

Extract all formats

use schemaorg_rs::{extract_all, SourceFormat};

let graph = extract_all(html)?;
for node in &graph.nodes {
    println!("{:?}: {:?}", node.source_format, node.types);
}

Validate against vocabulary

use schemaorg_rs::{extract_all, validation};

let graph = extract_all(html)?;
let result = validation::validate(&graph);

for diag in &result.diagnostics {
    println!("[{}] {} -- {}", diag.severity, diag.path, diag.message);
}

Check Rich Results eligibility

use schemaorg_rs::profiles::{ProfileRegistry, Eligibility};

let registry = ProfileRegistry::with_google();
let result = registry.evaluate("google", &graph, &diagnostics)?;

for tr in &result.type_results {
    println!("{}: eligible={}, missing={:?}",
        tr.schema_type, tr.eligible, tr.required_missing);
}

GitHub Action

- uses: mitrovicsinisaa/schemaorg-rs/.github/actions/schemaorg-validate@main
  with:
    files: 'dist/**/*.html'
    profile: google
    upload-sarif: 'true'

See Action README for full options.


Documentation


Why This Exists

Schema.org structured data is embedded in hundreds of millions of web pages. When it's broken -- a missing name on a Product, a wrong value type on offers.price -- search engines silently ignore it. No rich results. No AI citations. No visibility.

The only validators that understand Schema.org semantically are closed-source, hosted by Google, and require sending your URLs to their servers.

schemaorg-rs is the first open-source, offline, embeddable Schema.org validator. It runs in Rust, WASM, and CLI. It validates vocabulary correctness and Rich Results eligibility in one pass.


License

MIT -- see LICENSE