Struct encoding_rs::Decoder [−] [src]

pub struct Decoder {
    // some fields omitted
}

A converter that decodes a byte stream into Unicode according to a character encoding in a streaming (incremental) manner.

The various decode_* methods take an input buffer (src) and an output buffer dst both of which are caller-allocated. There are variants for both UTF-8 and UTF-16 output buffers.

A decode_* method decodes bytes from src into Unicode characters stored into dst until one of the following three things happens:

A malformed byte sequence is encountered.
The output buffer has been filled so near capacity that the decoder cannot be sure that processing an additional byte of input wouldn't cause so much output that the output buffer would overflow.
All the input bytes have been processed.

The decode_* method then returns tuple of a status indicating which one of the three reasons to return happened, how many input bytes were read, how many output code units (u8 when decoding into UTF-8 and u16 when decoding to UTF-16) were written (except when decoding into String, whose length change indicates this), and in the case of the variants performing replacement, a boolean indicating whether an error was replaced with the REPLACEMENT CHARACTER during the call.

In the case of the *_without_replacement variants, the status is a DecoderResult enumeration (possibilities Malformed, OutputFull and InputEmpty corresponding to the three cases listed above).

In the case of methods whose name does not end with *_without_replacement, malformed sequences are automatically replaced with the REPLACEMENT CHARACTER and errors do not cause the methods to return early.

When decoding to UTF-8, the output buffer must have at least 4 bytes of space. When decoding to UTF-16, the output buffer must have at least two UTF-16 code units (u16) of space.

When decoding to UTF-8 without replacement, the methods are guaranteed not to return indicating that more output space is needed if the length of the ouput buffer is at least the length returned by max_utf8_buffer_length_without_replacement(). When decoding to UTF-8 with replacement, the the length of the output buffer that guarantees the methods not to return indicating that more output space is needed is given by max_utf8_buffer_length(). When decoding to UTF-16 with or without replacement, the length of the output buffer that guarantees the methods not to return indicating that more output space is needed is given by max_utf16_buffer_length().

The output written into dst is guaranteed to be valid UTF-8 or UTF-16, and the output after each decode_* call is guaranteed to consist of complete characters. (I.e. the code unit sequence for the last character is guaranteed not to be split across output buffers.)

The boolean argument last indicates that the end of the stream is reached when all the bytes in src have been consumed.

A Decoder object can be used to incrementally decode a byte stream.

During the processing of a single stream, the caller must call decode_* zero or more times with last set to false and then call decode_* at least once with last set to true. If decode_* returns InputEmpty, the processing of the stream has ended. Otherwise, the caller must call decode_* again with last set to true (or treat a Malformed result as a fatal error).

Once the stream has ended, the Decoder object must not be used anymore. That is, you need to create another one to process another stream.

When the decoder returns OutputFull or the decoder returns Malformed and the caller does not wish to treat it as a fatal error, the input buffer src may not have been completely consumed. In that case, the caller must pass the unconsumed contents of src to decode_* again upon the next call.

Methods

`impl Decoder`
[src]

`fn encoding(&self) -> &'static Encoding`

The Encoding this Decoder is for.

BOM sniffing can change the return value of this method during the life of the decoder.

`fn max_utf16_buffer_length(&self, byte_length: usize) -> usize`

Query the worst-case UTF-16 output size (with or without replacement).

Returns the size of the output buffer in UTF-16 code units (u16) that will not overflow given the current state of the decoder and byte_length number of additional input bytes.

Since the REPLACEMENT CHARACTER fits into one UTF-16 code unit, the return value of this method applies also in the _with_replacement case.

Available via the C wrapper.

`fn max_utf8_buffer_length_without_replacement(&self, byte_length: usize) -> usize`

Query the worst-case UTF-8 output size without replacement.

Returns the size of the output buffer in UTF-8 code units (u8) that will not overflow given the current state of the decoder and byte_length number of additional input bytes when decoding without replacement error handling.

Note that this value may be too small for the _with_replacement case. Use max_utf8_buffer_length for that case.

Available via the C wrapper.

`fn max_utf8_buffer_length(&self, byte_length: usize) -> usize`

Query the worst-case UTF-8 output size with replacement.

Returns the size of the output buffer in UTF-8 code units (u8) that will not overflow given the current state of the decoder and byte_length number of additional input bytes when decoding with errors handled by outputting a REPLACEMENT CHARACTER for each malformed sequence.

Available via the C wrapper.