[][src]Function chardetng_c::chardetng_encoding_detector_guess

#[no_mangle]
pub unsafe extern "C" fn chardetng_encoding_detector_guess(
    detector: *const EncodingDetector,
    tld: *const u8,
    tld_len: usize,
    allow_utf8: bool
) -> *const Encoding

Guess the encoding given the bytes pushed to the detector so far (via chardetng_encoding_detector_feed()), the top-level domain name from which the bytes were loaded, and an indication of whether to consider UTF-8 as a permissible guess.

The tld argument takes the rightmost DNS label of the hostname of the host the stream was loaded from in lower-case ASCII form. That is, if the label is an internationalized top-level domain name, it must be provided in its Punycode form. If the TLD that the stream was loaded from is unavalable, NULL may be passed instead (and 0 as tld_len), which is equivalent to passing pointer to "com" as tld and 3 as tld_len.

If the allow_utf8 argument is set to false, the return value of this function won't be UTF_8_ENCODING. When performing detection on text/html on non-file: URLs, Web browsers must pass false, unless the user has taken a specific contextual action to request an override. This way, Web developers cannot start depending on UTF-8 detection. Such reliance would make the Web Platform more brittle.

Returns the guessed encoding (never NULL).

Panics

If tld is NULL but tld_len is not zero.

If tld contains non-ASCII, period, or upper-case letters. (The panic condition is intentionally limited to signs of failing to extract the label correctly, failing to provide it in its Punycode form, and failure to lower-case it. Full DNS label validation is intentionally not performed to avoid panics when the reality doesn't match the specs.)

Undefined Behavior

UB ensues if

  • detector does not point to a detector obtained from chardetng_detector_new but not yet freed with chardetng_detector_free.
  • tld is non-NULL and tld_len is non-zero but tld and tld_len don't designate a range of memory valid for reading.