Skip to main content

Crate oak_html

Crate oak_html 

Source
Expand description

Β§πŸ› οΈ HTML Parser Developer Guide

Html support for the Oak language framework.

This guide is designed to help you quickly get started with developing and integrating oak-html.

§🚦 Quick Start

Add the dependency to your Cargo.toml:

[dependencies]
oak-html = { path = "..." }

Β§Basic Parsing Example

The following is a standard workflow for parsing a modern HTML5 document with attributes and nested elements:

use oak_html::{HtmlParser, SourceText, HtmlLanguage};

fn main() {
    // 1. Prepare source code
    let code = r#"
        <!DOCTYPE html>
        <html lang="en">
        <head>
            <meta charset="UTF-8">
            <title>Oak HTML Example</title>
        </head>
        <body>
            <div id="app" class="container">
                <h1>Hello, Oak!</h1>
                <img src="logo.png" alt="Oak Logo" />
            </div>
            <script src="app.js"></script>
        </body>
        </html>
    "#;
    let source = SourceText::new(code);

    // 2. Initialize parser
    let config = HtmlLanguage::new();
    let parser = HtmlParser::new(&config);

    // 3. Execute parsing
    let result = parser.parse(&source);

    // 4. Handle results
    if result.is_success() {
        println!("Parsing successful! AST node count: {}", result.node_count());
    } else {
        eprintln!("Errors found during parsing.");
    }
}

Β§πŸ” Core API Usage

Β§1. Syntax Tree Traversal

After a successful parse, you can use the built-in visitor pattern or manually traverse the Green/Red Tree to extract HTML-specific constructs like element tags, attribute values, text content, or specific script and style blocks.

Β§2. Incremental Parsing

No need to re-parse a massive HTML document when small changes occur:

// Assuming you have an old parse result 'old_result' and new source text 'new_source'
let new_result = parser.reparse(&new_source, &old_result);

Β§3. Diagnostics

oak-html provides rich error contexts specifically tailored for web developers, handling complex scenarios like unclosed tags or malformed attribute syntax:

for diag in result.diagnostics() {
    println!("[{}:{}] {}", diag.line, diag.column, diag.message);
}

Β§πŸ—οΈ Architecture Overview

  • Lexer: Tokenizes HTML source text into a stream of tokens, including support for tags, attributes, text nodes, and special handling for script/style content.
  • Parser: Syntax analyzer based on the Pratt parsing algorithm to handle HTML’s hierarchical structure, void elements, and self-closing tags.
  • AST: A strongly-typed syntax abstraction layer designed for high-performance HTML analysis tools, scrapers, and IDEs.

Β§πŸ”— Advanced Resources

  • Full Examples: Check the examples/ folder in the project root.
  • API Documentation: Run cargo doc --open for detailed type definitions.
  • Test Cases: See tests/ for handling of various HTML5 edge cases and β€œtag soup.”

Β§πŸ› οΈ HTML Parser Developer Guide

Html support for the Oak language framework.

This guide is designed to help you quickly get started with developing and integrating oak-html.

§🚦 Quick Start

Add the dependency to your Cargo.toml:

[dependencies]
oak-html = { path = "..." }

Β§Basic Parsing Example

The following is a standard workflow for parsing a modern HTML5 document with attributes and nested elements:

use oak_html::{HtmlParser, SourceText, HtmlLanguage};

fn main() {
    // 1. Prepare source code
    let code = r#"
        <!DOCTYPE html>
        <html lang="en">
        <head>
            <meta charset="UTF-8">
            <title>Oak HTML Example</title>
        </head>
        <body>
            <div id="app" class="container">
                <h1>Hello, Oak!</h1>
                <img src="logo.png" alt="Oak Logo" />
            </div>
            <script src="app.js"></script>
        </body>
        </html>
    "#;
    let source = SourceText::new(code);

    // 2. Initialize parser
    let config = HtmlLanguage::new();
    let parser = HtmlParser::new(&config);

    // 3. Execute parsing
    let result = parser.parse(&source);

    // 4. Handle results
    if result.is_success() {
        println!("Parsing successful! AST node count: {}", result.node_count());
    } else {
        eprintln!("Errors found during parsing.");
    }
}

Β§πŸ” Core API Usage

Β§1. Syntax Tree Traversal

After a successful parse, you can use the built-in visitor pattern or manually traverse the Green/Red Tree to extract HTML-specific constructs like element tags, attribute values, text content, or specific script and style blocks.

Β§2. Incremental Parsing

No need to re-parse a massive HTML document when small changes occur:

// Assuming you have an old parse result 'old_result' and new source text 'new_source'
let new_result = parser.reparse(&new_source, &old_result);

Β§3. Diagnostics

oak-html provides rich error contexts specifically tailored for web developers, handling complex scenarios like unclosed tags or malformed attribute syntax:

for diag in result.diagnostics() {
    println!("[{}:{}] {}", diag.line, diag.column, diag.message);
}

Β§πŸ—οΈ Architecture Overview

  • Lexer: Tokenizes HTML source text into a stream of tokens, including support for tags, attributes, text nodes, and special handling for script/style content.
  • Parser: Syntax analyzer based on the Pratt parsing algorithm to handle HTML’s hierarchical structure, void elements, and self-closing tags.
  • AST: A strongly-typed syntax abstraction layer designed for high-performance HTML analysis tools, scrapers, and IDEs.

Β§πŸ”— Advanced Resources

  • Full Examples: Check the examples/ folder in the project root.
  • API Documentation: Run cargo doc --open for detailed type definitions.
  • Test Cases: See tests/ for handling of various HTML5 edge cases and β€œtag soup.”

Re-exportsΒ§

pub use crate::ast::HtmlDocument;
pub use crate::builder::HtmlBuilder;
pub use crate::language::HtmlLanguage;
pub use crate::lexer::HtmlLexer;
pub use crate::parser::HtmlParser;
pub use crate::lsp::highlighter::HtmlHighlighter;
pub use crate::lsp::HtmlLanguageService;
pub use lexer::token_type::HtmlTokenType;
pub use parser::element_type::HtmlElementType;

ModulesΒ§

ast
AST module for HTML nodes.
builder
Builder module for constructing HTML trees.
language
Kind module defining HTML syntax types. Language module for HTML configuration.
lexer
Lexer module for HTML tokenization.
lsp
LSP module for HTML language service features.
mcp
MCP module.
parser
Parser module for HTML syntax analysis.