[−][src]Function chardetng_c::chardetng_encoding_detector_guess
#[no_mangle] pub unsafe extern "C" fn chardetng_encoding_detector_guess(
detector: *const EncodingDetector,
tld: *const u8,
tld_len: usize,
allow_utf8: bool
) -> *const Encoding
Guess the encoding given the bytes pushed to the detector so far
(via chardetng_encoding_detector_feed()
), the top-level domain name
from which the bytes were loaded, and an indication of whether to
consider UTF-8 as a permissible guess.
The tld
argument takes the rightmost DNS label of the hostname of the
host the stream was loaded from in lower-case ASCII form. That is, if
the label is an internationalized top-level domain name, it must be
provided in its Punycode form. If the TLD that the stream was loaded
from is unavalable, NULL
may be passed instead (and 0 as tld_len
),
which is equivalent to passing pointer to "com" as tld
and 3 as
tld_len
.
If the allow_utf8
argument is set to false
, the return value of
this function won't be UTF_8_ENCODING
. When performing detection
on text/html
on non-file:
URLs, Web browsers must pass false
,
unless the user has taken a specific contextual action to request an
override. This way, Web developers cannot start depending on UTF-8
detection. Such reliance would make the Web Platform more brittle.
Returns the guessed encoding (never NULL
).
Panics
If tld
is NULL
but tld_len
is not zero.
If tld
contains non-ASCII, period, or upper-case letters. (The panic
condition is intentionally limited to signs of failing to extract the
label correctly, failing to provide it in its Punycode form, and failure
to lower-case it. Full DNS label validation is intentionally not performed
to avoid panics when the reality doesn't match the specs.)
Undefined Behavior
UB ensues if
detector
does not point to a detector obtained fromchardetng_detector_new
but not yet freed withchardetng_detector_free
.tld
is non-NULL andtld_len
is non-zero buttld
andtld_len
don't designate a range of memory valid for reading.