Struct utf8::Decoder [] [src]

pub struct Decoder {
    // some fields omitted
}

A low-level, zero-copy UTF-8 decoder with error handling.

This decoder can process input one chunk at a time, returns &str Unicode slices into the given &[u8] bytes input, and stops at each error to let the caller deal with it however they choose.

For example, String::from_utf8_lossy (but returning String instead of Cow) can be rewritten as:

fn string_from_utf8_lossy(mut input: &[u8]) -> String {
    let mut decoder = utf8::Decoder::new();
    let mut string = String::new();
    loop {
        let (reconstituted, decoded, result) = decoder.decode(input);
        debug_assert!(reconstituted.is_empty());  // We only have one chunk of input.
        string.push_str(decoded);
        match result {
            utf8::Result::Ok => return string,
            utf8::Result::Incomplete => {
                string.push_str(utf8::REPLACEMENT_CHARACTER);
                return string
            }
            utf8::Result::Error { remaining_input_after_error } => {
                string.push_str(utf8::REPLACEMENT_CHARACTER);
                input = remaining_input_after_error;
            }
        }
    }
}

See also LossyDecoder.

Methods

impl Decoder
[src]

fn new() -> Decoder

Create a new decoder.

fn has_incomplete_sequence(&self) -> bool

Return whether the input of the last call to .decode() returned Result::Incomplete. If this is true and there is no more input, this is a decoding error.

fn decode<'a>(&mut self, input_chunk: &'a [u8]) -> (InlineString, &'a str, Result<'a>)

Start decoding one chunk of input bytes. The return value is a tuple of:

  • An inline buffer of up to 4 bytes that dereferences to &str. When the length is non-zero (which can only happen when calling Decoder::decode with more input after the previous call returned Result::Incomplete), it represents a single code point that was re-assembled from multiple input chunks.
  • The Unicode slice of at the start of the input bytes chunk that is well-formed UTF-8. May be empty, for example when a decoding error occurs immediately after another.
  • Details about the rest of the input chuck. See the documentation of Result.