# schemaorg-rs
[](https://crates.io/crates/schemaorg-rs)
[](https://docs.rs/schemaorg-rs)
[](https://github.com/mitrovicsinisaa/schemaorg-rs/actions/workflows/ci.yml)
[](LICENSE)
[](https://www.npmjs.com/package/@schemaorg-rs/wasm)
**Extract, validate, and profile Schema.org structured data from HTML.**
Parse JSON-LD, Microdata, and RDFa into a unified data model. Validate against
the official Schema.org vocabulary. Check Google Rich Results eligibility.
All offline, all embeddable, all from a single Rust library.
---
## Quick Start
### As a Library
```rust
use schemaorg_rs::{extract_all, validation};
use schemaorg_rs::profiles::{ProfileRegistry, Eligibility};
let html = r#"<script type="application/ld+json">{
"@context": "https://schema.org",
"@type": "Product",
"name": "Widget",
"offers": { "@type": "Offer", "price": "29.99", "priceCurrency": "EUR" }
}</script>"#;
// Extract -> Validate -> Profile
let graph = extract_all(html).unwrap();
let result = validation::validate(&graph);
let registry = ProfileRegistry::with_google();
let profile = registry.evaluate("google", &graph, &result.diagnostics).unwrap();
match profile.eligibility {
Eligibility::Eligible => println!("Rich result eligible!"),
Eligibility::WarningsOnly => println!("Eligible with warnings"),
Eligibility::NotEligible => println!("Not eligible"),
Eligibility::Restricted => println!("Restricted"),
}
```
### As a CLI Tool
```bash
# Install
cargo install schemaorg-validate
# Validate a file
schemaorg-validate --file page.html --profile google
# Validate a URL
schemaorg-validate --url https://example.com --profile google
# JSON output for CI
schemaorg-validate --file page.html --format json
# SARIF for GitHub Code Scanning
schemaorg-validate --file page.html --format sarif > results.sarif
```
### As an npm Package (WASM)
```bash
npm install @schemaorg-rs/wasm
```
```javascript
import { validateHtml } from '@schemaorg-rs/wasm';
const result = JSON.parse(validateHtml(htmlString));
```
---
## Features
| **Extraction** | JSON-LD, Microdata, RDFa Lite into unified `SchemaNode` model |
| **Validation** | Type/property/value checking against Schema.org v30.0 |
| **Profiles** | Google Rich Results eligibility for 7 schema types |
| **CLI** | `schemaorg-validate` with text, JSON, and SARIF output |
| **WASM** | Browser/Node.js via WebAssembly |
| **Offline** | Vocabulary vendored at compile time, zero network calls |
### Extraction
- **JSON-LD** -- `@graph` arrays, `@id` cross-references, nested objects, source locations
- **Microdata** -- `itemscope`/`itemprop`, `itemref`, value extraction by element type
- **RDFa Lite** -- `vocab`/`typeof`/`property`, `prefix` namespaces, `resource` identifiers
### Validation
- Unknown/deprecated/pending types and properties
- Property domain checking (wrong type for property)
- Value type mismatches (Number where URL expected)
- Enum validation, boolean/number coercion warnings
- "Did you mean?" suggestions via Levenshtein distance
### Google Rich Results Profiles
| Product | -- |
| Article | NewsArticle, BlogPosting |
| FAQPage | -- (restricted since 2024) |
| BreadcrumbList | -- |
| LocalBusiness | Restaurant, Store, all subtypes |
| Event | -- |
| Recipe | -- |
### CLI Output Formats
- **Text** -- colored, human-readable with eligibility summary
- **JSON** -- structured output for programmatic consumption
- **SARIF 2.1.0** -- GitHub Code Scanning compatible
---
## Installation
### Library
```toml
[dependencies]
schemaorg-rs = "0.1"
```
### CLI
```bash
cargo install schemaorg-validate
```
### Feature Flags
| `extraction` | Yes | HTML parsing, all 3 extractors |
| `validation` | No | Schema.org vocabulary validation |
| `profiles` | No | Rich Results profiles |
| `wasm` | No | WASM bindings |
| `cli` | No | CLI binary (`schemaorg-validate`) |
| `full` | No | extraction + validation + profiles |
```toml
# Full library (no CLI/WASM)
schemaorg-rs = { version = "0.1", features = ["full"] }
# Core types only (no HTML parsing)
schemaorg-rs = { version = "0.1", default-features = false }
```
---
## Usage
### Extract all formats
```rust
use schemaorg_rs::{extract_all, SourceFormat};
let graph = extract_all(html)?;
for node in &graph.nodes {
println!("{:?}: {:?}", node.source_format, node.types);
}
```
### Validate against vocabulary
```rust
use schemaorg_rs::{extract_all, validation};
let graph = extract_all(html)?;
let result = validation::validate(&graph);
for diag in &result.diagnostics {
println!("[{}] {} -- {}", diag.severity, diag.path, diag.message);
}
```
### Check Rich Results eligibility
```rust
use schemaorg_rs::profiles::{ProfileRegistry, Eligibility};
let registry = ProfileRegistry::with_google();
let result = registry.evaluate("google", &graph, &diagnostics)?;
for tr in &result.type_results {
println!("{}: eligible={}, missing={:?}",
tr.schema_type, tr.eligible, tr.required_missing);
}
```
---
## GitHub Action
```yaml
- uses: mitrovicsinisaa/schemaorg-rs/.github/actions/schemaorg-validate@main
with:
files: 'dist/**/*.html'
profile: google
upload-sarif: 'true'
```
See [Action README](.github/actions/schemaorg-validate/README.md) for full options.
---
## Documentation
- [User Guide](docs/guide.md) -- getting started, all features
- [CLI Reference](docs/cli.md) -- all options, output formats, SARIF rule IDs
- [Architecture](docs/architecture.md) -- internal design, data flow, codegen
- [Profile Docs](docs/profiles/) -- required/recommended fields per type
- [API Reference](https://docs.rs/schemaorg-rs) -- full Rust docs
- [Contributing](CONTRIBUTING.md) -- dev setup, code standards, adding profiles
- [Changelog](CHANGELOG.md) -- version history
---
## Why This Exists
Schema.org structured data is embedded in hundreds of millions of web pages.
When it's broken -- a missing `name` on a `Product`, a wrong value type on
`offers.price` -- search engines silently ignore it. No rich results. No AI
citations. No visibility.
The only validators that understand Schema.org semantically are closed-source,
hosted by Google, and require sending your URLs to their servers.
`schemaorg-rs` is the first open-source, offline, embeddable Schema.org
validator. It runs in Rust, WASM, and CLI. It validates vocabulary correctness
*and* Rich Results eligibility in one pass.
---
## License
MIT -- see [LICENSE](LICENSE)