Oak HTML Parser
High-performance incremental HTML parser for the oak ecosystem with flexible configuration, optimized for web development and document processing.
🎯 Overview
Oak-html is a robust parser for HTML, designed to handle complete HTML syntax including modern features. Built on the solid foundation of oak-core, it provides both high-level convenience and detailed AST generation for web development and document processing.
✨ Features
- Complete HTML Syntax: Supports all HTML features including modern specifications
- Full AST Generation: Generates comprehensive Abstract Syntax Trees
- Lexer Support: Built-in tokenization with proper span information
- Error Recovery: Graceful handling of syntax errors with detailed diagnostics
🚀 Quick Start
Basic example:
use HtmlParser;
📋 Parsing Examples
Document Parsing
use ;
let parser = new;
let html_content = r#"
<!DOCTYPE html>
<html>
<head><title>Test</title></head>
<body><h1>Hello</h1></body>
</html>
"#;
let document = parser.parse_document?;
println!;
Element Parsing
use ;
let parser = new;
let html_content = r#"
<div class="container" id="main">
<p>Content</p>
</div>
"#;
let element = parser.parse_element?;
println!;
🔧 Advanced Features
Token-Level Parsing
use ;
let parser = new;
let tokens = parser.tokenize?;
for token in tokens
Error Handling
use HtmlParser;
let parser = new;
let invalid_html = r#"
<html>
<head><title>Test</title>
<body><h1>Hello</h1>
<!-- Missing closing tags -->
"#;
match parser.parse_document
🏗️ AST Structure
The parser generates a comprehensive AST with the following main structures:
- Document: Root container for HTML documents
- Element: HTML elements with tags and attributes
- Attribute: Element attributes with name-value pairs
- Text: Text content nodes
- Comment: HTML comments
📊 Performance
- Streaming: Parse large HTML files without loading entirely into memory
- Incremental: Re-parse only changed sections
- Memory Efficient: Smart AST node allocation
- Fast Recovery: Quick error recovery for better IDE integration
🔗 Integration
Oak of html integrates seamlessly with:
- Web Scraping: Extract data from HTML documents
- Template Engines: Parse and process HTML templates
- Static Site Generators: Process HTML content for websites
- IDE Support: Language server protocol compatibility
- Web Development: HTML parsing for development tools
📚 Examples
Check out the examples directory for comprehensive examples:
- Complete HTML document parsing
- Element and attribute analysis
- Code transformation
- Integration with development workflows
🤝 Contributing
Contributions are welcome!
Please feel free to submit pull requests at the project repository or open issues.