fhp-encoding 0.1.2

Encoding detection and conversion for the HTML parser

Coverage
100%
9 out of 9 items documented5 out of 6 items with examples
Size
Source code size: 32.77 kB This is the summed size of all the files inside the crates.io package for this release.
Documentation size: 510.01 kB This is the summed size of all files generated by rustdoc for all configured targets
Ø build duration
this release: 6s Average build duration of successful builds.
all releases: 12s Average build duration of successful builds in releases after 2024-10-23.
Links
Homepage
mehmetcansahin/fast_html_parser
1 0 0
crates.io
Dependencies
Versions
Owners

Encoding detection and conversion for the HTML parser.

Detects the character encoding of raw HTML bytes and converts them to UTF-8. The detection pipeline follows the HTML specification's encoding sniffing algorithm:

BOM (Byte Order Mark) detection
<meta charset="..."> prescan (first 1 KB)
<meta http-equiv="Content-Type" content="...charset=..."> prescan
Fallback to UTF-8

The actual decoding is delegated to [encoding_rs], which is SIMD-optimized by Mozilla/Servo.

Quick Start

use fhp_encoding::{detect, decode_or_detect};

let html = b"<html><head><meta charset=\"utf-8\"></head><body>Hello</body></html>";
let encoding = detect(html);
assert_eq!(encoding.name(), "UTF-8");

let (text, _enc) = decode_or_detect(html).unwrap();
assert!(text.contains("Hello"));