Skip to main content

Crate fast_html_parser

Crate fast_html_parser 

Source
Expand description

§fast-html-parser — SIMD-Optimized HTML Parser

A high-performance HTML parser designed for web scraping workloads. Uses SIMD instructions (SSE4.2, AVX2, NEON) for tokenization and builds a cache-line aligned arena-based DOM tree for fast traversal.

§Quick Start

use fast_html_parser::HtmlParser;

let doc = HtmlParser::parse("<div><p>Hello</p></div>").unwrap();
assert_eq!(doc.root().text_content(), "Hello");

§Builder Pattern

use fast_html_parser::HtmlParser;

let doc = HtmlParser::builder()
    .max_input_size(64 * 1024 * 1024) // 64 MiB
    .build()
    .parse_str("<div>Hello</div>")
    .unwrap();

§CSS Selectors

use fast_html_parser::prelude::*;

let doc = HtmlParser::parse("<ul><li>one</li><li>two</li></ul>").unwrap();
let items = doc.select("li").unwrap();
assert_eq!(items.len(), 2);

§Streaming

use fast_html_parser::streaming::parse_stream;

let html = b"<div><p>Hello</p></div>";
let doc = parse_stream(html.chunks(8)).unwrap();
assert_eq!(doc.root().text_content(), "Hello");

§Feature Flags

FeatureDefaultDescription
css-selectorYesCSS selector engine
entity-decodeYesHTML entity decoding
xpathNoXPath expression support
encodingNoAuto-detect encoding from raw bytes
async-tokioNoAsync parsing via Tokio

Re-exports§

pub use fhp_core as core_types;
pub use fhp_tokenizer as tokenizer;
pub use fhp_tree as tree;

Modules§

async_parserasync-tokio
Async parser (requires async-tokio feature).
encodingencoding
Encoding detection and conversion.
prelude
Convenience prelude that imports the most commonly used types.
streaming
Streaming and incremental parsing.
xpathxpath
XPath types (re-exported from selector crate).

Structs§

CompiledSelectorcss-selector or xpath
CSS selector and XPath engine. A pre-compiled CSS selector for reuse across documents and threads.
Document
Parsed document and node reference. A parsed HTML document backed by an arena.
DocumentIndexcss-selector or xpath
CSS selector and XPath engine. Pre-built index for O(1) id, class, and tag lookups.
HtmlParser
A configured HTML parser instance.
NodeId
Node identity type. Index into the arena’s node vector.
NodeRef
Parsed document and node reference. A borrowed reference to a node inside the document.
ParserBuilder
Configuration builder for the HTML parser.
Selectioncss-selector or xpath
CSS selector and XPath engine. A collection of matched nodes from a selector query.

Enums§

HtmlError
Parsed document and node reference. Error type for HTML parsing.
Tag
Interned HTML tag enum. Known HTML tag names interned as a u8 discriminant.

Traits§

Selectablecss-selector or xpath
CSS selector and XPath engine. Extension trait that adds CSS selector methods to Document.

Functions§

parse
Parse an HTML string with default settings (convenience alias).
parse_bytes
Parse raw bytes with default settings, auto-detecting encoding.
parse_owned
Parse an owned String with default settings, transferring the allocation.