Expand description
Encoding detection and conversion for the HTML parser.
Detects the character encoding of raw HTML bytes and converts them to UTF-8. The detection pipeline follows the HTML specification’s encoding sniffing algorithm:
- BOM (Byte Order Mark) detection
<meta charset="...">prescan (first 1 KB)<meta http-equiv="Content-Type" content="...charset=...">prescan- Fallback to UTF-8
The actual decoding is delegated to encoding_rs, which is
SIMD-optimized by Mozilla/Servo.
§Quick Start
use fhp_encoding::{detect, decode_or_detect};
let html = b"<html><head><meta charset=\"utf-8\"></head><body>Hello</body></html>";
let encoding = detect(html);
assert_eq!(encoding.name(), "UTF-8");
let (text, _enc) = decode_or_detect(html).unwrap();
assert!(text.contains("Hello"));Re-exports§
pub use decode::decode;pub use decode::decode_or_detect;pub use detect::detect;pub use stream::DecodingReader;
Modules§
- decode
- Decoding raw bytes to UTF-8 strings. Decoding raw bytes to UTF-8 strings.
- detect
- Encoding detection from raw bytes. Encoding detection from raw HTML bytes.
- stream
- Streaming decoder for chunk-based processing. Streaming decoder for chunk-based processing.
Structs§
- Encoding
- An encoding as defined in the Encoding Standard.