Oak HTML Parser

High-performance incremental HTML parser for the oak ecosystem with flexible configuration, optimized for web development and document processing.

🎯 Overview

Oak-html is a robust parser for HTML, designed to handle complete HTML syntax including modern features. Built on the solid foundation of oak-core, it provides both high-level convenience and detailed AST generation for web development and document processing.

✨ Features

Complete HTML Syntax: Supports all HTML features including modern specifications
Full AST Generation: Generates comprehensive Abstract Syntax Trees
Lexer Support: Built-in tokenization with proper span information
Error Recovery: Graceful handling of syntax errors with detailed diagnostics

🚀 Quick Start

Basic example:

use oak_html::HtmlParser;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let parser = HtmlParser::new();
    let html_content = r#"
<!DOCTYPE html>
<html>
<head>
    <title>My Page</title>
</head>
<body>
    <h1>Hello, HTML!</h1>
    <p>This is a paragraph.</p>
</body>
</html>
    "#;
    
    let document = parser.parse_document(html_content)?;
    println!("Parsed HTML document successfully.");
    Ok(())
}

📋 Parsing Examples

Document Parsing

use oak_html::{HtmlParser, ast::Document};

let parser = HtmlParser::new();
let html_content = r#"
<!DOCTYPE html>
<html>
<head><title>Test</title></head>
<body><h1>Hello</h1></body>
</html>
"#;

let document = parser.parse_document(html_content)?;
println!("Document has {} elements", document.elements.len());

Element Parsing

use oak_html::{HtmlParser, ast::Element};

let parser = HtmlParser::new();
let html_content = r#"
<div class="container" id="main">
    <p>Content</p>
</div>
"#;

let element = parser.parse_element(html_content)?;
println!("Element tag: {}", element.tag_name);

🔧 Advanced Features

Token-Level Parsing

use oak_html::{HtmlParser, lexer::Token};

let parser = HtmlParser::new();
let tokens = parser.tokenize("<div>content</div>")?;
for token in tokens {
    println!("{:?}", token.kind);
}

Error Handling

use oak_html::HtmlParser;

let parser = HtmlParser::new();
let invalid_html = r#"
<html>
<head><title>Test</title>
<body><h1>Hello</h1>
<!-- Missing closing tags -->
"#;

match parser.parse_document(invalid_html) {
    Ok(document) => println!("Parsed HTML document successfully."),
    Err(e) => {
        println!("Parse error at line {} column {}: {}", 
            e.line(), e.column(), e.message());
        if let Some(context) = e.context() {
            println!("Error context: {}", context);
        }
    }
}

🏗️ AST Structure

The parser generates a comprehensive AST with the following main structures:

Document: Root container for HTML documents
Element: HTML elements with tags and attributes
Attribute: Element attributes with name-value pairs
Text: Text content nodes
Comment: HTML comments

📊 Performance

Streaming: Parse large HTML files without loading entirely into memory
Incremental: Re-parse only changed sections
Memory Efficient: Smart AST node allocation
Fast Recovery: Quick error recovery for better IDE integration

🔗 Integration

Oak of html integrates seamlessly with:

Web Scraping: Extract data from HTML documents
Template Engines: Parse and process HTML templates
Static Site Generators: Process HTML content for websites
IDE Support: Language server protocol compatibility
Web Development: HTML parsing for development tools

📚 Examples

Check out the examples directory for comprehensive examples:

Complete HTML document parsing
Element and attribute analysis
Code transformation
Integration with development workflows

🤝 Contributing

Contributions are welcome!

Please feel free to submit pull requests at the project repository or open issues.

oak-html 0.0.1