[−][src]Struct chardetng::EncodingDetector
A Web browser-oriented detector for guessing what character encoding a stream of bytes is encoded in.
The bytes are fed to the detector incrementally using the feed
method. The current guess of the detector can be queried using
the guess
method. The guessing parameters are arguments to the
guess
method rather than arguments to the constructor in order
to enable the application to check if the arguments affect the
guessing outcome. (The specific use case is to disable UI for
re-running the detector with UTF-8 allowed and the top-level
domain name ignored if those arguments don't change the guess.)
Implementations
impl EncodingDetector
[src]
pub fn feed(&mut self, buffer: &[u8], last: bool) -> bool
[src]
Inform the detector of a chunk of input.
The byte stream is represented as a sequence of calls to this method such that the concatenation of the arguments to this method form the byte stream. It does not matter how the application chooses to chunk the stream. It is OK to call this method with a zero-length byte slice.
The end of the stream is indicated by calling this method with
last
set to true
. In that case, the end of the stream is
considered to occur after the last byte of the buffer
(which
may be zero-length) passed in the same call. Once this method
has been called with last
set to true
this method must not
be called again.
If you want to perform detection on just the prefix of a longer
stream, do not pass last=true
after the prefix if the stream
actually still continues.
Returns true
if after processing buffer
the stream has
contained at least one non-ASCII byte and false
if only
ASCII has been seen so far.
Panics
If this method has previously been called with last
set to true
.
pub fn guess(&self, tld: Option<&[u8]>, allow_utf8: bool) -> &'static Encoding
[src]
Guess the encoding given the bytes pushed to the detector so far
(via feed()
), the top-level domain name from which the bytes were
loaded, and an indication of whether to consider UTF-8 as a permissible
guess.
The tld
argument takes the rightmost DNS label of the hostname of the
host the stream was loaded from in lower-case ASCII form. That is, if
the label is an internationalized top-level domain name, it must be
provided in its Punycode form. If the TLD that the stream was loaded
from is unavalable, None
may be passed instead, which is equivalent
to passing Some(b"com")
.
If the allow_utf8
argument is set to false
, the return value of
this method won't be encoding_rs::UTF_8
. When performing detection
on text/html
on non-file:
URLs, Web browsers must pass false
,
unless the user has taken a specific contextual action to request an
override. This way, Web developers cannot start depending on UTF-8
detection. Such reliance would make the Web Platform more brittle.
Returns the guessed encoding.
Panics
If tld
contains non-ASCII, period, or upper-case letters. (The panic
condition is intentionally limited to signs of failing to extract the
label correctly, failing to provide it in its Punycode form, and failure
to lower-case it. Full DNS label validation is intentionally not performed
to avoid panics when the reality doesn't match the specs.)
pub fn new() -> Self
[src]
Creates a new instance of the detector.
Auto Trait Implementations
impl RefUnwindSafe for EncodingDetector
impl Send for EncodingDetector
impl Sync for EncodingDetector
impl Unpin for EncodingDetector
impl UnwindSafe for EncodingDetector
Blanket Implementations
impl<T> Any for T where
T: 'static + ?Sized,
[src]
T: 'static + ?Sized,
impl<T> Borrow<T> for T where
T: ?Sized,
[src]
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
[src]
T: ?Sized,
pub fn borrow_mut(&mut self) -> &mut T
[src]
impl<T> From<T> for T
[src]
impl<T, U> Into<U> for T where
U: From<T>,
[src]
U: From<T>,
impl<T, U> TryFrom<U> for T where
U: Into<T>,
[src]
U: Into<T>,
type Error = Infallible
The type returned in the event of a conversion error.
pub fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
[src]
impl<T, U> TryInto<U> for T where
U: TryFrom<T>,
[src]
U: TryFrom<T>,