fhp-encoding 0.1.0

Encoding detection and conversion for the HTML parser
Documentation
  • Coverage
  • 100%
    9 out of 9 items documented5 out of 6 items with examples
  • Size
  • Source code size: 28.12 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.31 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 21s Average build duration of successful builds.
  • all releases: 21s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • Repository
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • mehmetcansahin

Encoding detection and conversion for the HTML parser.

Detects the character encoding of raw HTML bytes and converts them to UTF-8. The detection pipeline follows the HTML specification's encoding sniffing algorithm:

  1. BOM (Byte Order Mark) detection
  2. <meta charset="..."> prescan (first 1 KB)
  3. <meta http-equiv="Content-Type" content="...charset=..."> prescan
  4. Fallback to UTF-8

The actual decoding is delegated to [encoding_rs], which is SIMD-optimized by Mozilla/Servo.

Quick Start

use fhp_encoding::{detect, decode_or_detect};

let html = b"<html><head><meta charset=\"utf-8\"></head><body>Hello</body></html>";
let encoding = detect(html);
assert_eq!(encoding.name(), "UTF-8");

let (text, _enc) = decode_or_detect(html).unwrap();
assert!(text.contains("Hello"));