Struct encoding_rs::Decoder
[−]
[src]
pub struct Decoder { // some fields omitted }
A converter that decodes a byte stream into Unicode according to a character encoding in a streaming (incremental) manner.
The various decode_*
methods take an input buffer (src
) and an output
buffer dst
both of which are caller-allocated. There are variants for
both UTF-8 and UTF-16 output buffers.
A decode_*
method decodes bytes from src
into Unicode characters stored
into dst
until one of the following three things happens:
A malformed byte sequence is encountered.
The output buffer has been filled so near capacity that the decoder cannot be sure that processing an additional byte of input wouldn't cause so much output that the output buffer would overflow.
All the input bytes have been processed.
The decode_*
method then returns tuple of a status indicating which one
of the three reasons to return happened, how many input bytes were read,
how many output code units (u8
when decoding into UTF-8 and u16
when decoding to UTF-16) were written (except when decoding into String
,
whose length change indicates this), and in the case of the
variants performing replacement, a boolean indicating whether an error was
replaced with the REPLACEMENT CHARACTER during the call.
In the case of the *_without_replacement
variants, the status is a
DecoderResult
enumeration (possibilities Malformed
, OutputFull
and
InputEmpty
corresponding to the three cases listed above).
In the case of methods whose name does not end with
*_without_replacement
, malformed sequences are automatically replaced
with the REPLACEMENT CHARACTER and errors do not cause the methods to
return early.
When decoding to UTF-8, the output buffer must have at least 4 bytes of
space. When decoding to UTF-16, the output buffer must have at least two
UTF-16 code units (u16
) of space.
When decoding to UTF-8 without replacement, the methods are guaranteed
not to return indicating that more output space is needed if the length
of the ouput buffer is at least the length returned by
max_utf8_buffer_length_without_replacement()
. When decoding to UTF-8 with
replacement, the the length of the output buffer that guarantees the
methods not to return indicating that more output space is needed is given
by max_utf8_buffer_length()
. When decoding to UTF-16 with
or without replacement, the length of the output buffer that guarantees
the methods not to return indicating that more output space is needed is
given by max_utf16_buffer_length()
.
The output written into dst
is guaranteed to be valid UTF-8 or UTF-16,
and the output after each decode_*
call is guaranteed to consist of
complete characters. (I.e. the code unit sequence for the last character is
guaranteed not to be split across output buffers.)
The boolean argument last
indicates that the end of the stream is reached
when all the bytes in src
have been consumed.
A Decoder
object can be used to incrementally decode a byte stream.
During the processing of a single stream, the caller must call decode_*
zero or more times with last
set to false
and then call decode_*
at
least once with last
set to true
. If decode_*
returns InputEmpty
,
the processing of the stream has ended. Otherwise, the caller must call
decode_*
again with last
set to true
(or treat a Malformed
result as
a fatal error).
Once the stream has ended, the Decoder
object must not be used anymore.
That is, you need to create another one to process another stream.
When the decoder returns OutputFull
or the decoder returns Malformed
and
the caller does not wish to treat it as a fatal error, the input buffer
src
may not have been completely consumed. In that case, the caller must
pass the unconsumed contents of src
to decode_*
again upon the next
call.
Methods
impl Decoder
[src]
fn encoding(&self) -> &'static Encoding
The Encoding
this Decoder
is for.
BOM sniffing can change the return value of this method during the life of the decoder.
fn max_utf16_buffer_length(&self, byte_length: usize) -> usize
Query the worst-case UTF-16 output size (with or without replacement).
Returns the size of the output buffer in UTF-16 code units (u16
)
that will not overflow given the current state of the decoder and
byte_length
number of additional input bytes.
Since the REPLACEMENT CHARACTER fits into one UTF-16 code unit, the
return value of this method applies also in the
_with_replacement
case.
Available via the C wrapper.
fn max_utf8_buffer_length_without_replacement(&self, byte_length: usize) -> usize
Query the worst-case UTF-8 output size without replacement.
Returns the size of the output buffer in UTF-8 code units (u8
)
that will not overflow given the current state of the decoder and
byte_length
number of additional input bytes when decoding without
replacement error handling.
Note that this value may be too small for the _with_replacement
case.
Use max_utf8_buffer_length
for that case.
Available via the C wrapper.
fn max_utf8_buffer_length(&self, byte_length: usize) -> usize
Query the worst-case UTF-8 output size with replacement.
Returns the size of the output buffer in UTF-8 code units (u8
)
that will not overflow given the current state of the decoder and
byte_length
number of additional input bytes when decoding with
errors handled by outputting a REPLACEMENT CHARACTER for each malformed
sequence.
Available via the C wrapper.
fn decode_to_utf16_without_replacement(&mut self, src: &[u8], dst: &mut [u16], last: bool) -> (DecoderResult, usize, usize)
Incrementally decode a byte stream into UTF-16.
See the documentation of the struct for
documentation for decode_*
methods
collectively.
Available via the C wrapper.
fn decode_to_utf8_without_replacement(&mut self, src: &[u8], dst: &mut [u8], last: bool) -> (DecoderResult, usize, usize)
Incrementally decode a byte stream into UTF-8.
See the documentation of the struct for
documentation for decode_*
methods
collectively.
Available via the C wrapper.
fn decode_to_str_without_replacement(&mut self, src: &[u8], dst: &mut str, last: bool) -> (DecoderResult, usize, usize)
Incrementally decode a byte stream into UTF-8 with type system signaling of UTF-8 validity.
This methods calls decode_to_utf8
and then zeroes out up to three
bytes that aren't logically part of the write in order to retain the
UTF-8 validity even for the unwritten part of the buffer.
See the documentation of the struct for documentation for decode_*
methods collectively.
Available to Rust only.
fn decode_to_string_without_replacement(&mut self, src: &[u8], dst: &mut String, last: bool) -> (DecoderResult, usize)
Incrementally decode a byte stream into UTF-8 using a String
receiver.
Like the others, this method follows the logic that the output buffer is
caller-allocated. This method treats the capacity of the String
as
the output limit. That is, this method guarantees not to cause a
reallocation of the backing buffer of String
.
The return value is a pair that contains the DecoderResult
and the
number of bytes read. The number of bytes written is signaled via
the length of the String
changing.
See the documentation of the struct for documentation for decode_*
methods collectively.
Available to Rust only.
fn decode_to_utf16(&mut self, src: &[u8], dst: &mut [u16], last: bool) -> (CoderResult, usize, usize, bool)
Incrementally decode a byte stream into UTF-16 with malformed sequences replaced with the REPLACEMENT CHARACTER.
See the documentation of the struct for documentation for decode_*
methods collectively.
Available via the C wrapper.
fn decode_to_utf8(&mut self, src: &[u8], dst: &mut [u8], last: bool) -> (CoderResult, usize, usize, bool)
Incrementally decode a byte stream into UTF-8 with malformed sequences replaced with the REPLACEMENT CHARACTER.
See the documentation of the struct for documentation for decode_*
methods collectively.
Available via the C wrapper.
fn decode_to_str(&mut self, src: &[u8], dst: &mut str, last: bool) -> (CoderResult, usize, usize, bool)
Incrementally decode a byte stream into UTF-8 with malformed sequences replaced with the REPLACEMENT CHARACTER with type system signaling of UTF-8 validity.
This methods calls decode_to_utf8
and then zeroes
out up to three bytes that aren't logically part of the write in order
to retain the UTF-8 validity even for the unwritten part of the buffer.
See the documentation of the struct for documentation for decode_*
methods collectively.
Available to Rust only.
fn decode_to_string(&mut self, src: &[u8], dst: &mut String, last: bool) -> (CoderResult, usize, bool)
Incrementally decode a byte stream into UTF-8 with malformed sequences
replaced with the REPLACEMENT CHARACTER using a String
receiver.
Like the others, this method follows the logic that the output buffer is
caller-allocated. This method treats the capacity of the String
as
the output limit. That is, this method guarantees not to cause a
reallocation of the backing buffer of String
.
The return value is a tuple that contains the DecoderResult
, the
number of bytes read and a boolean indicating whether replacements
were done. The number of bytes written is signaled via the length of
the String
changing.
See the documentation of the struct for documentation for decode_*
methods collectively.
Available to Rust only.