Enum encode_unicode::error::Utf8ErrorKind

source · [−]

pub enum Utf8ErrorKind {
    TooFewBytes,
    NonUtf8Byte,
    UnexpectedContinuationByte,
    InterruptedSequence,
    OverlongEncoding,
    Utf16ReservedCodepoint,
    TooHighCodepoint,
}

Expand description

The types of errors that can occur when decoding a UTF-8 codepoint.

The variants are more technical than what an end user is likely interested in, but might be useful for deciding how to handle the error.

They can be grouped into three categories:

Will happen regularly if decoding chunked or buffered text: TooFewBytes.
Input might be binary, a different encoding or corrupted, UnexpectedContinuationByte and InterruptedSequence.
(Broken UTF-8 sequence).
Less likely to happen accidentaly and might be malicious: OverlongEncoding, Utf16ReservedCodepoint and TooHighCodepoint. Note that theese can still be caused by certain valid latin-1 strings such as "Á©" (b"\xC1\xA9").

Variants

`TooFewBytes`

There are too few bytes to decode the codepoint.

This can happen when a slice is empty or too short, or an iterator returned None while in the middle of a codepoint.
This error is never produced by functions accepting fixed-size [u8; 4] arrays.

If decoding text coming chunked (such as in buffers passed to Read), the remaing bytes should be carried over into the next chunk or buffer. (including the byte this error was produced for.)

`NonUtf8Byte`

A byte which is never used by well-formed UTF-8 was encountered.

This means that the input is using a different encoding, is corrupted or binary.

This error is returned when a byte in the following ranges is encountered anywhere in an UTF-8 sequence:

192 and 193 (0b1100_000x): Indicates an overlong encoding of a single-byte, ASCII, character, and should therefore never occur.
248.. (0b1111_1xxx): Sequences cannot be longer than 4 bytes.
245..=247 (0b1111_0101 | 0b1111_0110): Indicates a too high codepoint. (above \u10ffff)

`UnexpectedContinuationByte`

The first byte is not a valid start of a codepoint.

This might happen as a result of slicing into the middle of a codepoint, the input not being UTF-8 encoded or being corrupted. Errors of this type coming right after another error should probably be ignored, unless returned more than three times in a row.

This error is returned when the first byte has a value in the range 128..=191 (0b1000_0000..=0b1011_1111).

`InterruptedSequence`

The byte at index 1..=3 should be a continuation byte, but doesn’t fit the pattern 0b10xx_xxxx.

When the input slice or iterator has too few bytes, TooFewBytes is returned instead.

`OverlongEncoding`

The encoding of the codepoint has so many leading zeroes that it could be a byte shorter.

Successfully decoding this can present a security issue: Doing so could allow an attacker to circumvent input validation that only checks for ASCII characters, and input characters or strings that would otherwise be rejected, such as /../.

This error is only returned for 3 and 4-byte encodings; NonUtf8Byte is returned for bytes that start longer or shorter overlong encodings.

`Utf16ReservedCodepoint`

The codepoint is reserved for UTF-16 surrogate pairs.

(Utf8Char cannot be used to work with the WTF-8 encoding for UCS-2 strings.)

This error is returned for codepoints in the range \ud800..=\udfff. (which are three bytes long as UTF-8)

`TooHighCodepoint`

The codepoint is higher than \u10ffff, which is the highest codepoint unicode permits.

Enum encode_unicode::error::Utf8ErrorKind

Variants

TooFewBytes

NonUtf8Byte

UnexpectedContinuationByte

InterruptedSequence

OverlongEncoding

Utf16ReservedCodepoint

TooHighCodepoint

Trait Implementations

impl Clone for Utf8ErrorKind

fn clone(&self) -> Utf8ErrorKind

fn clone_from(&mut self, source: &Self)

impl Debug for Utf8ErrorKind

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl PartialEq<Utf8Error> for Utf8ErrorKind

fn eq(&self, error: &Utf8Error) -> bool

fn ne(&self, other: &Rhs) -> bool

impl PartialEq<Utf8ErrorKind> for Utf8Error

fn eq(&self, kind: &Utf8ErrorKind) -> bool

fn ne(&self, other: &Rhs) -> bool

impl PartialEq<Utf8ErrorKind> for Utf8ErrorKind

fn eq(&self, other: &Utf8ErrorKind) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Copy for Utf8ErrorKind

impl Eq for Utf8ErrorKind

impl StructuralEq for Utf8ErrorKind

impl StructuralPartialEq for Utf8ErrorKind

Auto Trait Implementations

impl RefUnwindSafe for Utf8ErrorKind

impl Send for Utf8ErrorKind

impl Sync for Utf8ErrorKind

impl Unpin for Utf8ErrorKind

impl UnwindSafe for Utf8ErrorKind

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for T where T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for T where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for T where U: From<T>,

fn into(self) -> U

impl<T> ToOwned for T where T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for T where U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for T where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

`TooFewBytes`

`NonUtf8Byte`

`UnexpectedContinuationByte`

`InterruptedSequence`

`OverlongEncoding`

`Utf16ReservedCodepoint`

`TooHighCodepoint`

impl<T> Any for T where
T: 'static + ?Sized,

impl<T> Borrow<T> for T where
T: ?Sized,

impl<T> BorrowMut<T> for T where
T: ?Sized,

impl<T, U> Into<U> for T where
U: From<T>,

impl<T> ToOwned for T where
T: Clone,

impl<T, U> TryFrom<U> for T where
U: Into<T>,

impl<T, U> TryInto<U> for T where
U: TryFrom<T>,