pub struct EncodingDetector { /* private fields */ }
Expand description
A Web browser-oriented detector for guessing what character encoding a stream of bytes is encoded in.
The bytes are fed to the detector incrementally using the feed
method. The current guess of the detector can be queried using
the guess
method. The guessing parameters are arguments to the
guess
method rather than arguments to the constructor in order
to enable the application to check if the arguments affect the
guessing outcome. (The specific use case is to disable UI for
re-running the detector with UTF-8 allowed and the top-level
domain name ignored if those arguments don’t change the guess.)
Implementations§
Source§impl EncodingDetector
impl EncodingDetector
Sourcepub fn feed(&mut self, buffer: &[u8], last: bool) -> bool
pub fn feed(&mut self, buffer: &[u8], last: bool) -> bool
Inform the detector of a chunk of input.
The byte stream is represented as a sequence of calls to this method such that the concatenation of the arguments to this method form the byte stream. It does not matter how the application chooses to chunk the stream. It is OK to call this method with a zero-length byte slice.
The end of the stream is indicated by calling this method with
last
set to true
. In that case, the end of the stream is
considered to occur after the last byte of the buffer
(which
may be zero-length) passed in the same call. Once this method
has been called with last
set to true
this method must not
be called again.
If you want to perform detection on just the prefix of a longer
stream, do not pass last=true
after the prefix if the stream
actually still continues.
Returns true
if after processing buffer
the stream has
contained at least one non-ASCII byte and false
if only
ASCII has been seen so far.
§Panics
If this method has previously been called with last
set to true
.
Sourcepub fn guess(&self, tld: Option<&[u8]>, allow_utf8: bool) -> &'static Encoding
pub fn guess(&self, tld: Option<&[u8]>, allow_utf8: bool) -> &'static Encoding
Guess the encoding given the bytes pushed to the detector so far
(via feed()
), the top-level domain name from which the bytes were
loaded, and an indication of whether to consider UTF-8 as a permissible
guess.
The tld
argument takes the rightmost DNS label of the hostname of the
host the stream was loaded from in lower-case ASCII form. That is, if
the label is an internationalized top-level domain name, it must be
provided in its Punycode form. If the TLD that the stream was loaded
from is unavalable, None
may be passed instead, which is equivalent
to passing Some(b"com")
.
If the allow_utf8
argument is set to false
, the return value of
this method won’t be encoding_rs::UTF_8
. When performing detection
on text/html
on non-file:
URLs, Web browsers must pass false
,
unless the user has taken a specific contextual action to request an
override. This way, Web developers cannot start depending on UTF-8
detection. Such reliance would make the Web Platform more brittle.
Returns the guessed encoding.
§Panics
If tld
contains non-ASCII, period, or upper-case letters. (The panic
condition is intentionally limited to signs of failing to extract the
label correctly, failing to provide it in its Punycode form, and failure
to lower-case it. Full DNS label validation is intentionally not performed
to avoid panics when the reality doesn’t match the specs.)
Sourcepub fn guess_assess(
&self,
tld: Option<&[u8]>,
allow_utf8: bool,
) -> (&'static Encoding, bool)
pub fn guess_assess( &self, tld: Option<&[u8]>, allow_utf8: bool, ) -> (&'static Encoding, bool)
Same as guess()
, but also returns a Boolean indicating
whether the guessed encoding had a higher score than at least
one other candidate. If this method returns false
, the
guessed encoding is likely to be wrong.