Struct encoding_rs::Encoder [−] [src]

pub struct Encoder {
    // some fields omitted
}

A converter that encodes a Unicode stream into bytes according to a character encoding in a streaming (incremental) manner.

The various encode_* methods take an input buffer (src) and an output buffer dst both of which are caller-allocated. There are variants for both UTF-8 and UTF-16 input buffers.

A encode_* methods encode characters from src into bytes characters stored into dst until one of the following three things happens:

An unmappable character is encountered.
The output buffer has been filled so near capacity that the decoder cannot be sure that processing an additional character of input wouldn't cause so much output that the output buffer would overflow.
All the input characters have been processed.

The encode_* method then returns tuple of a status indicating which one of the three reasons to return happened, how many input code units (u8 when encoding from UTF-8 and u16 when encoding from UTF-16) were read, how many output bytes were written (except when encoding into Vec<u8>, whose length change indicates this), and in the case of the variants that perform replacement, a boolean indicating whether an unmappable character was replaced with a numeric character reference during the call.

In the case of the methods whose name ends with *_without_replacement, the status is an EncoderResult enumeration (possibilities Unmappable, OutputFull and InputEmpty corresponding to the three cases listed above).

In the case of methods whose name does not end with *_without_replacement, unmappable characters are automatically replaced with the corresponding numeric character references and unmappable characters do not cause the methods to return early.

XXX: When decoding to UTF-8 without replacement, the methods are guaranteed not to return indicating that more output space is needed if the length of the ouput buffer is at least the length returned by max_utf8_buffer_length_without_replacement(). When decoding to UTF-8 with replacement, the the length of the output buffer that guarantees the methods not to return indicating that more output space is needed is given by max_utf8_buffer_length(). When decoding to UTF-16 with or without replacement, the length of the output buffer that guarantees the methods not to return indicating that more output space is needed is given by max_utf16_buffer_length().

When encoding from UTF-8, each src buffer must be valid UTF-8. (When calling from Rust, the type system takes care of this.) When encoding from UTF-16, unpaired surrogates in the input are treated as U+FFFD REPLACEMENT CHARACTERS. Therefore, in order for astral characters not to turn into a pair of REPLACEMENT CHARACTERS, the caller must ensure that surrogate pairs are not split across input buffer boundaries.

XXX: Except in the case of ISO-2022-JP, the output of each encode_* call is guaranteed to consist of a valid byte sequence of complete characters. (I.e. the code unit sequence for the last character is guaranteed not to be split across output buffers.)

The boolean argument last indicates that the end of the stream is reached when all the characters in src have been consumed. This argument is needed for ISO-2022-JP and is ignored for other encodings.

An Encoder object can be used to incrementally encode a byte stream.

During the processing of a single stream, the caller must call encode_* zero or more times with last set to false and then call encode_* at least once with last set to true. If encode_* returns InputEmpty, the processing of the stream has ended. Otherwise, the caller must call encode_* again with last set to true (or treat an Unmappable result as a fatal error).

Once the stream has ended, the Encoder object must not be used anymore. That is, you need to create another one to process another stream.

When the encoder returns OutputFull or the encoder returns Unmappable and the caller does not wish to treat it as a fatal error, the input buffer src may not have been completely consumed. In that case, the caller must pass the unconsumed contents of src to encode_* again upon the next call.

Methods

`impl Encoder`
[src]

`fn encoding(&self) -> &'static Encoding`

The Encoding this Encoder is for.

`fn max_buffer_length_from_utf16_without_replacement(&self, u16_length: usize) -> usize`

Query the worst-case output size when encoding from UTF-16 without replacement.

Returns the size of the output buffer in bytes that will not overflow given the current state of the encoder and u16_length number of additional input code units.

Available via the C wrapper.

`fn max_buffer_length_from_utf8_without_replacement(&self, byte_length: usize) -> usize`

Query the worst-case output size when encoding from UTF-8 without replacement.

Returns the size of the output buffer in bytes that will not overflow given the current state of the encoder and byte_length number of additional input code units.

Available via the C wrapper.

`fn max_buffer_length_from_utf16_if_no_unmappables(&self, u16_length: usize) -> usize`

Query the worst-case output size when encoding from UTF-16 with replacement.

Returns the size of the output buffer in bytes that will not overflow given the current state of the encoder and u16_length number of additional input code units.

Available via the C wrapper.

`fn max_buffer_length_from_utf8_if_no_unmappables(&self, byte_length: usize) -> usize`

Query the worst-case output size when encoding from UTF-8 with replacement.

Returns the size of the output buffer in bytes that will not overflow given the current state of the encoder and byte_length number of additional input code units.

Available via the C wrapper.