# schemaorg-rs
[](https://crates.io/crates/schemaorg-validate)
[](https://docs.rs/schemaorg-validate)
[](https://github.com/mitrovicsinisaa/schemaorg-rs/actions/workflows/ci.yml)
[](LICENSE)
[](https://www.npmjs.com/package/@schemaorg-rs/wasm)
**The first open-source, offline, embeddable Schema.org structured data validator.**
Parse JSON-LD, Microdata, and RDFa into a unified data model. Validate against
the official Schema.org vocabulary (v30.0). Check Google Rich Results eligibility.
All offline, all embeddable, all from a single Rust library.
---
## Current Status
| **Extraction Engine** | ✅ Stable | JSON-LD, Microdata, RDFa Lite — 3 formats, unified output |
| **Vocabulary Validation** | ✅ Stable | 800+ types, 1400+ properties, Schema.org v30.0 |
| **Rich Results Profiles** | ✅ Stable | 7 Google profiles + baseline |
| **CLI** | ✅ Stable | `schemaorg-validate` — text, JSON, SARIF output |
| **WASM / npm** | ✅ Stable | [`@schemaorg-rs/wasm`](https://www.npmjs.com/package/@schemaorg-rs/wasm) |
| **Test Suite** | ✅ 290 tests | All passing, all features |
Published on [crates.io](https://crates.io/crates/schemaorg-validate) and [npm](https://www.npmjs.com/package/@schemaorg-rs/wasm).
---
## Quick Start
### As a Library
```rust
use schemaorg_rs::{extract_all, validation};
use schemaorg_rs::profiles::{ProfileRegistry, Eligibility};
let html = r#"<script type="application/ld+json">{
"@context": "https://schema.org",
"@type": "Product",
"name": "Widget",
"offers": { "@type": "Offer", "price": "29.99", "priceCurrency": "EUR" }
}</script>"#;
// Extract -> Validate -> Profile
let graph = extract_all(html).unwrap();
let result = validation::validate(&graph);
let registry = ProfileRegistry::with_google();
let profile = registry.evaluate("google", &graph, &result.diagnostics).unwrap();
match profile.eligibility {
Eligibility::Eligible => println!("Rich result eligible!"),
Eligibility::WarningsOnly => println!("Eligible with warnings"),
Eligibility::NotEligible => println!("Not eligible"),
Eligibility::Restricted => println!("Restricted"),
}
```
### As a CLI Tool
```bash
# Install
cargo install schemaorg-validate
# Validate a file
schemaorg-validate --file page.html --profile google
# Validate a URL
schemaorg-validate --url https://example.com --profile google
# JSON output for CI
schemaorg-validate --file page.html --format json
# SARIF for GitHub Code Scanning
schemaorg-validate --file page.html --format sarif > results.sarif
```
### As an npm Package (WASM)
```bash
npm install @schemaorg-rs/wasm
```
```javascript
import { validateHtml } from '@schemaorg-rs/wasm';
const result = JSON.parse(validateHtml(htmlString));
```
---
## Features
| **Extraction** | JSON-LD, Microdata, RDFa Lite into unified `SchemaNode` model |
| **Validation** | Type/property/value checking against Schema.org v30.0 |
| **Profiles** | Google Rich Results eligibility for 7 schema types |
| **CLI** | `schemaorg-validate` with text, JSON, and SARIF output |
| **WASM** | Browser/Node.js via WebAssembly |
| **Offline** | Vocabulary vendored at compile time, zero network calls |
### Extraction
- **JSON-LD** — `@graph` arrays, `@id` cross-references, nested objects, source locations
- **Microdata** — `itemscope`/`itemprop`, `itemref`, value extraction by element type
- **RDFa Lite** — `vocab`/`typeof`/`property`, `prefix` namespaces, `resource` identifiers
### Validation
- Unknown/deprecated/pending types and properties
- Property domain checking (wrong type for property)
- Value type mismatches (Number where URL expected)
- Enum validation, boolean/number coercion warnings
- "Did you mean?" suggestions via Levenshtein distance
### Google Rich Results Profiles
| Product | — |
| Article | NewsArticle, BlogPosting |
| FAQPage | — (restricted since 2024) |
| BreadcrumbList | — |
| LocalBusiness | Restaurant, Store, all subtypes |
| Event | — |
| Recipe | — |
### CLI Output Formats
- **Text** — colored, human-readable with eligibility summary
- **JSON** — structured output for programmatic consumption
- **SARIF 2.1.0** — GitHub Code Scanning compatible
---
## Installation
### Library
```toml
[dependencies]
schemaorg-validate = "0.3"
```
### CLI
```bash
cargo install schemaorg-validate
```
### Feature Flags
| `extraction` | Yes | HTML parsing, all 3 extractors |
| `validation` | No | Schema.org vocabulary validation |
| `profiles` | No | Rich Results profiles |
| `wasm` | No | WASM bindings |
| `cli` | No | CLI binary (`schemaorg-validate`) |
| `full` | No | extraction + validation + profiles |
```toml
# Full library (no CLI/WASM)
schemaorg-validate = { version = "0.3", features = ["full"] }
# Core types only (no HTML parsing)
schemaorg-validate = { version = "0.3", default-features = false }
```
---
## Usage
### Extract all formats
```rust
use schemaorg_rs::{extract_all, SourceFormat};
let graph = extract_all(html)?;
for node in &graph.nodes {
println!("{:?}: {:?}", node.source_format, node.types);
}
```
### Validate against vocabulary
```rust
use schemaorg_rs::{extract_all, validation};
let graph = extract_all(html)?;
let result = validation::validate(&graph);
for diag in &result.diagnostics {
println!("[{}] {} -- {}", diag.severity, diag.path, diag.message);
}
```
### Check Rich Results eligibility
```rust
use schemaorg_rs::profiles::{ProfileRegistry, Eligibility};
let registry = ProfileRegistry::with_google();
let result = registry.evaluate("google", &graph, &diagnostics)?;
for tr in &result.type_results {
println!("{}: eligible={}, missing={:?}",
tr.schema_type, tr.eligible, tr.required_missing);
}
```
---
## GitHub Action
```yaml
- uses: mitrovicsinisaa/schemaorg-rs/.github/actions/schemaorg-validate@main
with:
files: 'dist/**/*.html'
profile: google
upload-sarif: 'true'
```
See [Action README](.github/actions/schemaorg-validate/README.md) for full options.
---
## Architecture
```
HTML input
│
├─ JSON-LD extractor ──────┐
├─ Microdata extractor ────┤──▶ StructuredDataGraph
└─ RDFa Lite extractor ────┘ │
├──▶ Vocabulary Validator (Schema.org v30.0)
│ │
│ ▼
│ ValidationResult (diagnostics)
│ │
└──▶ Profile Engine ──▶ ProfileResult (eligibility)
│
┌───┴────┐
Google Baseline
(7 types) (generic)
```
The Schema.org vocabulary (800+ types, 1400+ properties) is resolved entirely at
**compile time** via `build.rs` codegen. Runtime validation uses static `match`
trees — zero heap allocation, zero parsing, ideal for WASM.
---
## Documentation
- [User Guide](docs/guide.md) — getting started, all features
- [CLI Reference](docs/cli.md) — all options, output formats, SARIF rule IDs
- [Architecture](docs/architecture.md) — internal design, data flow, codegen
- [Profile Docs](docs/profiles/) — required/recommended fields per type
- [API Reference](https://docs.rs/schemaorg-validate) — full Rust docs
- [Contributing](CONTRIBUTING.md) — dev setup, code standards, adding profiles
- [Changelog](CHANGELOG.md) — version history
---
## Why This Exists
Schema.org structured data is embedded in hundreds of millions of web pages.
When it's broken — a missing `name` on a `Product`, a wrong value type on
`offers.price` — search engines silently ignore it. No rich results. No AI
citations. No visibility.
The only validators that understand Schema.org semantically are closed-source,
hosted by Google, and require sending your URLs to their servers.
`schemaorg-rs` is the first open-source, offline, embeddable Schema.org
validator. It runs in Rust, WASM, and CLI. It validates vocabulary correctness
*and* Rich Results eligibility in one pass.
---
## Roadmap
The core library is stable and shipping. Future work focuses on ecosystem
integration and expanded coverage:
- **CMS Integrations** — Shopware 6 plugin, TYPO3 extension, WordPress plugin
- **Language Bindings** — Python (PyO3), PHP extension, native Node.js (napi)
- **Additional Profiles** — VideoObject, JobPosting, HowTo, Course, Review, Dataset
- **Hosted API** — Self-hostable HTTP API + Docker image (open-source Rich Results Test alternative)
- **Auto-fix Engine** — Not just "this is broken" but "here's the corrected JSON-LD"
- **Schema.org W3C Engagement** — `schema:pending` support, upstream test fixtures
See [CHANGELOG.md](CHANGELOG.md) for version history.
---
## License
MIT — see [LICENSE](LICENSE)