HTML Translator
ucp-translator-html provides conversion from HTML to UCM documents.
Overview
The HTML translator enables:
- Parsing - Convert HTML to UCM documents
- Semantic mapping - Map HTML elements to semantic roles
- Heading strategies - Configure how headings are processed
Installation
[]
= "0.1"
Quick Start
use parse_html;
let html = r#"
<!DOCTYPE html>
<html>
<head><title>My Document</title></head>
<body>
<h1>Introduction</h1>
<p>Welcome to the guide.</p>
<h2>Getting Started</h2>
<p>Here's some content.</p>
</body>
</html>
"#;
let doc = parse_html.unwrap;
println!;
Parsing HTML
HtmlParser
use ;
// Default parser
let parser = new;
let doc = parser.parse?;
// With custom configuration
let config = HtmlParserConfig ;
let parser = with_config;
let doc = parser.parse?;
Supported Elements
| HTML Element | UCM Content Type | Semantic Role |
|---|---|---|
<h1> |
Text | heading1 |
<h2> |
Text | heading2 |
<h3> |
Text | heading3 |
<h4> |
Text | heading4 |
<h5> |
Text | heading5 |
<h6> |
Text | heading6 |
<p> |
Text | paragraph |
<pre><code> |
Code | code |
<ul>/<ol> |
Text | list |
<blockquote> |
Text | quote |
<table> |
Table | table |
Heading Strategies
| Strategy | Description |
|---|---|
FromTags |
Use HTML heading tags directly (h1-h6) |
FromHierarchy |
Derive heading level from document structure |
Public API
pub use ;
pub use ;
pub use parse_html;
See Also
- UCM Core Content Types - Content type reference
- Markdown Translator - Markdown support
- Documents - Document structure