semantic-dom-ssg
Machine-readable web semantics for AI agents.
O(1) element lookup, deterministic navigation, and token-efficient serialization optimized for LLM consumption.
Features
- O(1) Lookup: Hash-indexed nodes via
AHashMapfor constant-time element access - Semantic State Graph: Explicit FSM for UI states and transitions
- Agent Summary: ~100 tokens vs ~800 for JSON (87% reduction)
- Security Hardened: Input validation, URL sanitization, size limits
Quick Start
use ;
let html = r#"
<html>
<body>
<nav><a href="/">Home</a></nav>
<main><button>Submit</button></main>
</body>
</html>
"#;
let sdom = parse.unwrap;
// O(1) lookup by iterating index
for in &sdom.index
// Token-efficient summary (~100 tokens)
let summary = sdom.to_agent_summary;
println!;
Installation
Add to your Cargo.toml:
[]
= "0.2"
CLI Tool
# Install CLI
# Parse HTML to JSON
# Token-efficient summary
# One-line summary (~20 tokens)
# Validate for agent compatibility
# Compare token usage
Output Formats
JSON (Full)
Agent Summary (~100 tokens)
PAGE: My Page
LANDMARKS: nav(nav), main(main)
ACTIONS: [nav]Home, [act]Submit
STATE: initial -> Home
STATS: 2L 2A 0H
One-liner (~20 tokens)
My Page | 2L 2A | nav,main | lnk:Home,btn:Submit
Security
This crate implements security hardening per ISO/IEC-SDOM-SSG-DRAFT-2024:
- Input Size Limits: 10MB default maximum
- URL Validation: Only
https,http,fileprotocols allowed - Protocol Blocking:
javascript:,data:,vbscript:,blob:blocked - No Script Execution: HTML parsing only, no JS evaluation
use validate_url;
assert!;
assert!;
Agent Certification
Validate HTML documents for AI agent compatibility:
use ;
let sdom = parse.unwrap;
let cert = certify;
println!;
Certification Levels
| Level | Badge | Requirements |
|---|---|---|
| AAA | 🥇 | Score 90+ (full compliance) |
| AA | 🥈 | Score 70-89 (deterministic FSM) |
| A | 🥉 | Score 50-69 (basic compliance) |
| None | ❌ | Score < 50 |
Performance
Benchmarks on standard HTML documents:
| Operation | Time |
|---|---|
| Parse (10KB) | ~500μs |
| Parse (100KB) | ~5ms |
| O(1) Lookup | ~10ns |
| Agent Summary | ~50μs |
Standards
Implements ISO/IEC-SDOM-SSG-DRAFT-2024 specification for:
- Semantic element classification
- State graph construction
- Agent-ready certification
- Token-efficient serialization
Related
- semantic-dom-ssg (npm) - TypeScript implementation
- ISO/IEC-SDOM-SSG-DRAFT-2024 - Specification
License
MIT License - see LICENSE for details.
Author
George Alexander info@gorgalxandr.com