[][src]Struct chardetng::EncodingDetector

pub struct EncodingDetector { /* fields omitted */ }

A Web browser-oriented detector for guessing what character encoding a stream of bytes is encoded in.

The bytes are fed to the detector incrementally using the feed method. The current guess of the detector can be queried using the guess method. The guessing parameters are arguments to the guess method rather than arguments to the constructor in order to enable the application to check if the arguments affect the guessing outcome. (The specific use case is to disable UI for re-running the detector with UTF-8 allowed and the top-level domain name ignored if those arguments don't change the guess.)

Implementations

impl EncodingDetector[src]

pub fn feed(&mut self, buffer: &[u8], last: bool) -> bool[src]

Inform the detector of a chunk of input.

The byte stream is represented as a sequence of calls to this method such that the concatenation of the arguments to this method form the byte stream. It does not matter how the application chooses to chunk the stream. It is OK to call this method with a zero-length byte slice.

The end of the stream is indicated by calling this method with last set to true. In that case, the end of the stream is considered to occur after the last byte of the buffer (which may be zero-length) passed in the same call. Once this method has been called with last set to true this method must not be called again.

If you want to perform detection on just the prefix of a longer stream, do not pass last=true after the prefix if the stream actually still continues.

Returns true if after processing buffer the stream has contained at least one non-ASCII byte and false if only ASCII has been seen so far.

Panics

If this method has previously been called with last set to true.

pub fn guess(&self, tld: Option<&[u8]>, allow_utf8: bool) -> &'static Encoding[src]

Guess the encoding given the bytes pushed to the detector so far (via feed()), the top-level domain name from which the bytes were loaded, and an indication of whether to consider UTF-8 as a permissible guess.

The tld argument takes the rightmost DNS label of the hostname of the host the stream was loaded from in lower-case ASCII form. That is, if the label is an internationalized top-level domain name, it must be provided in its Punycode form. If the TLD that the stream was loaded from is unavalable, None may be passed instead, which is equivalent to passing Some(b"com").

If the allow_utf8 argument is set to false, the return value of this method won't be encoding_rs::UTF_8. When performing detection on text/html on non-file: URLs, Web browsers must pass false, unless the user has taken a specific contextual action to request an override. This way, Web developers cannot start depending on UTF-8 detection. Such reliance would make the Web Platform more brittle.

Returns the guessed encoding.

Panics

If tld contains non-ASCII, period, or upper-case letters. (The panic condition is intentionally limited to signs of failing to extract the label correctly, failing to provide it in its Punycode form, and failure to lower-case it. Full DNS label validation is intentionally not performed to avoid panics when the reality doesn't match the specs.)

pub fn new() -> Self[src]

Creates a new instance of the detector.

Auto Trait Implementations

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.