fhp-tokenizer 0.1.0

SIMD-accelerated HTML tokenizer with structural indexing
Documentation
  • Coverage
  • 100%
    127 out of 127 items documented8 out of 39 items with examples
  • Size
  • Source code size: 148.58 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 7.5 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 24s Average build duration of successful builds.
  • all releases: 24s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • Repository
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • mehmetcansahin

SIMD-accelerated HTML tokenizer.

Uses a two-stage pipeline inspired by simdjson:

  1. Structural indexing (SIMD): scan input in 64-byte blocks, produce per-delimiter bitmasks, then apply quote-aware masking.
  2. Token extraction (scalar): walk the structural index to emit tokens via a branchless state machine.

Quick Start

use fhp_tokenizer::tokenize;

let tokens = tokenize("<div>hello</div>");
assert!(tokens.len() >= 3);