detect_encoding_with_hint

Function detect_encoding_with_hint 

Source
pub fn detect_encoding_with_hint(
    data: &[u8],
    content_type: Option<&str>,
) -> &'static str
Expand description

Detect encoding with optional HTTP Content-Type hint

This is the preferred function when parsing feeds from HTTP responses, as it considers the Content-Type charset parameter in addition to BOM and XML declaration detection.

§Priority Order

  1. BOM (Byte Order Mark) - highest priority, cannot be wrong
  2. HTTP Content-Type charset (if provided)
  3. XML declaration encoding attribute
  4. Default to UTF-8

§Arguments

  • data - Raw byte data
  • content_type - Optional HTTP Content-Type header value

§Returns

Detected encoding name

§Examples

use feedparser_rs::util::encoding::detect_encoding_with_hint;

// BOM takes priority over Content-Type
let data = b"\xEF\xBB\xBF<?xml version=\"1.0\"?>";
assert_eq!(
    detect_encoding_with_hint(data, Some("text/xml; charset=ISO-8859-1")),
    "UTF-8"
);

// Content-Type is used when no BOM
let data = b"<?xml version=\"1.0\"?>";
assert_eq!(
    detect_encoding_with_hint(data, Some("text/xml; charset=ISO-8859-1")),
    "windows-1252"
);

// Falls back to XML declaration when no Content-Type
let data = b"<?xml version=\"1.0\" encoding=\"UTF-16\"?>";
assert_eq!(detect_encoding_with_hint(data, None), "UTF-16LE");