ucp-translator-html 0.1.18

HTML to UCM document translator
Documentation

HTML Translator

ucp-translator-html provides conversion from HTML to UCM documents.

Overview

The HTML translator enables:

  • Parsing - Convert HTML to UCM documents
  • Semantic mapping - Map HTML elements to semantic roles
  • Heading strategies - Configure how headings are processed

Installation

[dependencies]
ucp-translator-html = "0.1"

Quick Start

use ucp_translator_html::parse_html;

let html = r#"
<!DOCTYPE html>
<html>
<head><title>My Document</title></head>
<body>
  <h1>Introduction</h1>
  <p>Welcome to the guide.</p>
  <h2>Getting Started</h2>
  <p>Here's some content.</p>
</body>
</html>
"#;

let doc = parse_html(html).unwrap();
println!("Parsed {} blocks", doc.block_count());

Parsing HTML

HtmlParser

use ucp_translator_html::{HtmlParser, HtmlParserConfig, HeadingStrategy};

// Default parser
let parser = HtmlParser::new();
let doc = parser.parse(html)?;

// With custom configuration
let config = HtmlParserConfig {
    heading_strategy: HeadingStrategy::FromHierarchy,
    ..Default::default()
};
let parser = HtmlParser::with_config(config);
let doc = parser.parse(html)?;

Supported Elements

HTML Element UCM Content Type Semantic Role
<h1> Text heading1
<h2> Text heading2
<h3> Text heading3
<h4> Text heading4
<h5> Text heading5
<h6> Text heading6
<p> Text paragraph
<pre><code> Code code
<ul>/<ol> Text list
<blockquote> Text quote
<table> Table table

Heading Strategies

Strategy Description
FromTags Use HTML heading tags directly (h1-h6)
FromHierarchy Derive heading level from document structure

Public API

pub use error::{HtmlError, Result};
pub use parser::{HtmlParser, HtmlParserConfig, HeadingStrategy};
pub use parse_html;

See Also