Crate fast_html_parser

Expand description

§fast-html-parser — SIMD-Optimized HTML Parser

A high-performance HTML parser designed for web scraping workloads. Uses SIMD instructions (SSE4.2, AVX2, NEON) for tokenization and builds a cache-line aligned arena-based DOM tree for fast traversal.

§Quick Start

use fast_html_parser::HtmlParser;

let doc = HtmlParser::parse("<div><p>Hello</p></div>").unwrap();
assert_eq!(doc.root().text_content(), "Hello");

§Builder Pattern

use fast_html_parser::HtmlParser;

let doc = HtmlParser::builder()
    .max_input_size(64 * 1024 * 1024) // 64 MiB
    .build()
    .parse_str("<div>Hello</div>")
    .unwrap();

§CSS Selectors

use fast_html_parser::prelude::*;

let doc = HtmlParser::parse("<ul><li>one</li><li>two</li></ul>").unwrap();
let items = doc.select("li").unwrap();
assert_eq!(items.len(), 2);

§Streaming

use fast_html_parser::streaming::parse_stream;

let html = b"<div><p>Hello</p></div>";
let doc = parse_stream(html.chunks(8)).unwrap();
assert_eq!(doc.root().text_content(), "Hello");

§Feature Flags

Feature	Default	Description
`css-selector`	Yes	CSS selector engine
`entity-decode`	Yes	HTML entity decoding
`xpath`	No	XPath expression support
`encoding`	No	Auto-detect encoding from raw bytes
`async-tokio`	No	Async parsing via Tokio

Re-exports§

pub use fhp_core as core_types;
pub use fhp_tokenizer as tokenizer;
pub use fhp_tree as tree;

Modules§

async_parserasync-tokio: Async parser (requires async-tokio feature).
encodingencoding: Encoding detection and conversion.
prelude: Convenience prelude that imports the most commonly used types.
streaming: Streaming and incremental parsing.
xpathxpath: XPath types (re-exported from selector crate).

Structs§

CompiledSelectorcss-selector or xpath: CSS selector and XPath engine. A pre-compiled CSS selector for reuse across documents and threads.
Document: Parsed document and node reference. A parsed HTML document backed by an arena.
DocumentIndexcss-selector or xpath: CSS selector and XPath engine. Pre-built index for O(1) id, class, and tag lookups.
HtmlParser: A configured HTML parser instance.
NodeId: Node identity type. Index into the arena’s node vector.
NodeRef: Parsed document and node reference. A borrowed reference to a node inside the document.
ParserBuilder: Configuration builder for the HTML parser.
Selectioncss-selector or xpath: CSS selector and XPath engine. A collection of matched nodes from a selector query.

Enums§

HtmlError: Parsed document and node reference. Error type for HTML parsing.
Tag: Interned HTML tag enum. Known HTML tag names interned as a u8 discriminant.

Traits§

Selectablecss-selector or xpath: CSS selector and XPath engine. Extension trait that adds CSS selector methods to Document.

Functions§

parse: Parse an HTML string with default settings (convenience alias).
parse_bytes: Parse raw bytes with default settings, auto-detecting encoding.
parse_owned: Parse an owned String with default settings, transferring the allocation.

Crate fast_html_parser

Crate fast_html_parser Copy item path

§fast-html-parser — SIMD-Optimized HTML Parser

§Quick Start

§Builder Pattern

§CSS Selectors

§Streaming

§Feature Flags

Re-exports§

Modules§

Structs§

Enums§

Traits§

Functions§

Crate fast_html_parser