Expand description
§fast-html-parser — SIMD-Optimized HTML Parser
A high-performance HTML parser designed for web scraping workloads. Uses SIMD instructions (SSE4.2, AVX2, NEON) for tokenization and builds a cache-line aligned arena-based DOM tree for fast traversal.
§Quick Start
use fast_html_parser::HtmlParser;
let doc = HtmlParser::parse("<div><p>Hello</p></div>").unwrap();
assert_eq!(doc.root().text_content(), "Hello");§Builder Pattern
use fast_html_parser::HtmlParser;
let doc = HtmlParser::builder()
.max_input_size(64 * 1024 * 1024) // 64 MiB
.build()
.parse_str("<div>Hello</div>")
.unwrap();§CSS Selectors
use fast_html_parser::prelude::*;
let doc = HtmlParser::parse("<ul><li>one</li><li>two</li></ul>").unwrap();
let items = doc.select("li").unwrap();
assert_eq!(items.len(), 2);§Streaming
use fast_html_parser::streaming::parse_stream;
let html = b"<div><p>Hello</p></div>";
let doc = parse_stream(html.chunks(8)).unwrap();
assert_eq!(doc.root().text_content(), "Hello");§Feature Flags
| Feature | Default | Description |
|---|---|---|
css-selector | Yes | CSS selector engine |
entity-decode | Yes | HTML entity decoding |
xpath | No | XPath expression support |
encoding | No | Auto-detect encoding from raw bytes |
async-tokio | No | Async parsing via Tokio |
Re-exports§
pub use fhp_core as core_types;pub use fhp_tokenizer as tokenizer;pub use fhp_tree as tree;
Modules§
- async_
parser async-tokio - Async parser (requires
async-tokiofeature). - encoding
encoding - Encoding detection and conversion.
- prelude
- Convenience prelude that imports the most commonly used types.
- streaming
- Streaming and incremental parsing.
- xpath
xpath - XPath types (re-exported from selector crate).
Structs§
- Compiled
Selector css-selectororxpath - CSS selector and XPath engine. A pre-compiled CSS selector for reuse across documents and threads.
- Document
- Parsed document and node reference. A parsed HTML document backed by an arena.
- Document
Index css-selectororxpath - CSS selector and XPath engine. Pre-built index for O(1) id, class, and tag lookups.
- Html
Parser - A configured HTML parser instance.
- NodeId
- Node identity type. Index into the arena’s node vector.
- NodeRef
- Parsed document and node reference. A borrowed reference to a node inside the document.
- Parser
Builder - Configuration builder for the HTML parser.
- Selection
css-selectororxpath - CSS selector and XPath engine. A collection of matched nodes from a selector query.
Enums§
- Html
Error - Parsed document and node reference. Error type for HTML parsing.
- Tag
- Interned HTML tag enum.
Known HTML tag names interned as a
u8discriminant.
Traits§
- Selectable
css-selectororxpath - CSS selector and XPath engine.
Extension trait that adds CSS selector methods to
Document.
Functions§
- parse
- Parse an HTML string with default settings (convenience alias).
- parse_
bytes - Parse raw bytes with default settings, auto-detecting encoding.
- parse_
owned - Parse an owned
Stringwith default settings, transferring the allocation.