Expand description
The C API for encoding_rs.
§Mapping from Rust
§Naming convention
The wrapper function for each method has a name that starts with the name of the struct lower-cased, followed by an underscore and ends with the name of the method.
For example, Encoding::for_label()
is wrapped as encoding_for_label()
.
§Arguments
Functions that wrap non-static methods take the self
object as their
first argument.
Slice argument foo
is decomposed into a pointer foo
and a length
foo_len
.
§Return values
Multiple return values become out-params. When an out-param is
length-related, foo_len
for a slice becomes a pointer in order to become
an in/out-param.
DecoderResult
, EncoderResult
and CoderResult
become uint32_t
.
InputEmpty
becomes INPUT_EMPTY
. OutputFull
becomes OUTPUT_FULL
.
Unmappable
becomes the scalar value of the unmappable character.
Malformed
becomes a number whose lowest 8 bits, which can have the decimal
value 0, 1, 2 or 3, indicate the number of bytes that were consumed after
the malformed sequence and whose next-lowest 8 bits, when shifted right by
8 indicate the length of the malformed byte sequence (possible decimal
values 1, 2, 3 or 4). The maximum possible sum of the two is 6.
Structs§
- Const
Encoding - Newtype for
*const Encoding
in order to be able to implementSync
for it.
Constants§
- ENCODING_
NAME_ MAX_ LENGTH - The minimum length of buffers that may be passed to
encoding_name()
. - INPUT_
EMPTY - Return value for
*_decode_*
and*_encode_*
functions that indicates that the input has been exhausted. - OUTPUT_
FULL - Return value for
*_decode_*
and*_encode_*
functions that indicates that the output space has been exhausted.
Statics§
- BIG5_
ENCODING - The Big5 encoding.
- EUC_
JP_ ENCODING - The EUC-JP encoding.
- EUC_
KR_ ENCODING - The EUC-KR encoding.
- GB18030_
ENCODING - The gb18030 encoding.
- GBK_
ENCODING - The GBK encoding.
- IBM866_
ENCODING - The IBM866 encoding.
- ISO_
2022_ JP_ ENCODING - The ISO-2022-JP encoding.
- ISO_
8859_ 2_ ENCODING - The ISO-8859-2 encoding.
- ISO_
8859_ 3_ ENCODING - The ISO-8859-3 encoding.
- ISO_
8859_ 4_ ENCODING - The ISO-8859-4 encoding.
- ISO_
8859_ 5_ ENCODING - The ISO-8859-5 encoding.
- ISO_
8859_ 6_ ENCODING - The ISO-8859-6 encoding.
- ISO_
8859_ 7_ ENCODING - The ISO-8859-7 encoding.
- ISO_
8859_ 8_ ENCODING - The ISO-8859-8 encoding.
- ISO_
8859_ 8_ I_ ENCODING - The ISO-8859-8-I encoding.
- ISO_
8859_ 10_ ENCODING - The ISO-8859-10 encoding.
- ISO_
8859_ 13_ ENCODING - The ISO-8859-13 encoding.
- ISO_
8859_ 14_ ENCODING - The ISO-8859-14 encoding.
- ISO_
8859_ 15_ ENCODING - The ISO-8859-15 encoding.
- ISO_
8859_ 16_ ENCODING - The ISO-8859-16 encoding.
- KOI8_
R_ ENCODING - The KOI8-R encoding.
- KOI8_
U_ ENCODING - The KOI8-U encoding.
- MACINTOSH_
ENCODING - The macintosh encoding.
- REPLACEMENT_
ENCODING - The replacement encoding.
- SHIFT_
JIS_ ENCODING - The Shift_JIS encoding.
- UTF_
8_ ENCODING - The UTF-8 encoding.
- UTF_
16BE_ ENCODING - The UTF-16BE encoding.
- UTF_
16LE_ ENCODING - The UTF-16LE encoding.
- WINDOWS_
874_ ENCODING - The windows-874 encoding.
- WINDOWS_
1250_ ENCODING - The windows-1250 encoding.
- WINDOWS_
1251_ ENCODING - The windows-1251 encoding.
- WINDOWS_
1252_ ENCODING - The windows-1252 encoding.
- WINDOWS_
1253_ ENCODING - The windows-1253 encoding.
- WINDOWS_
1254_ ENCODING - The windows-1254 encoding.
- WINDOWS_
1255_ ENCODING - The windows-1255 encoding.
- WINDOWS_
1256_ ENCODING - The windows-1256 encoding.
- WINDOWS_
1257_ ENCODING - The windows-1257 encoding.
- WINDOWS_
1258_ ENCODING - The windows-1258 encoding.
- X_
MAC_ CYRILLIC_ ENCODING - The x-mac-cyrillic encoding.
- X_
USER_ DEFINED_ ENCODING - The x-user-defined encoding.
Functions§
- decoder_
decode_ ⚠to_ utf8 - Incrementally decode a byte stream into UTF-8 with malformed sequences replaced with the REPLACEMENT CHARACTER.
- decoder_
decode_ ⚠to_ utf8_ without_ replacement - Incrementally decode a byte stream into UTF-8 without replacement.
- decoder_
decode_ ⚠to_ utf16 - Incrementally decode a byte stream into UTF-16 with malformed sequences replaced with the REPLACEMENT CHARACTER.
- decoder_
decode_ ⚠to_ utf16_ without_ replacement - Incrementally decode a byte stream into UTF-16 without replacement.
- decoder_
encoding ⚠ - The
Encoding
thisDecoder
is for. - decoder_
free ⚠ - Deallocates a
Decoder
previously allocated byencoding_new_decoder()
. - decoder_
latin1_ ⚠byte_ compatible_ up_ to - Checks for compatibility with storing Unicode scalar values as unsigned bytes taking into account the state of the decoder.
- decoder_
max_ ⚠utf8_ buffer_ length - Query the worst-case UTF-8 output size with replacement.
- decoder_
max_ ⚠utf8_ buffer_ length_ without_ replacement - Query the worst-case UTF-8 output size without replacement.
- decoder_
max_ ⚠utf16_ buffer_ length - Query the worst-case UTF-16 output size (with or without replacement).
- encoder_
encode_ ⚠from_ utf8 - Incrementally encode into byte stream from UTF-8 with unmappable characters replaced with HTML (decimal) numeric character references.
- encoder_
encode_ ⚠from_ utf8_ without_ replacement - Incrementally encode into byte stream from UTF-8 without replacement.
- encoder_
encode_ ⚠from_ utf16 - Incrementally encode into byte stream from UTF-16 with unmappable characters replaced with HTML (decimal) numeric character references.
- encoder_
encode_ ⚠from_ utf16_ without_ replacement - Incrementally encode into byte stream from UTF-16 without replacement.
- encoder_
encoding ⚠ - The
Encoding
thisEncoder
is for. - encoder_
free ⚠ - Deallocates an
Encoder
previously allocated byencoding_new_encoder()
. - encoder_
has_ ⚠pending_ state - Returns
true
if this is an ISO-2022-JP encoder that’s not in the ASCII state andfalse
otherwise. - encoder_
max_ ⚠buffer_ length_ from_ utf8_ if_ no_ unmappables - Query the worst-case output size when encoding from UTF-8 with replacement.
- encoder_
max_ ⚠buffer_ length_ from_ utf8_ without_ replacement - Query the worst-case output size when encoding from UTF-8 without replacement.
- encoder_
max_ ⚠buffer_ length_ from_ utf16_ if_ no_ unmappables - Query the worst-case output size when encoding from UTF-16 with replacement.
- encoder_
max_ ⚠buffer_ length_ from_ utf16_ without_ replacement - Query the worst-case output size when encoding from UTF-16 without replacement.
- encoding_
ascii_ ⚠valid_ up_ to - Validates ASCII.
- encoding_
can_ ⚠encode_ everything - Checks whether the output encoding of this encoding can encode every Unicode scalar. (Only true if the output encoding is UTF-8.)
- encoding_
for_ ⚠bom - Performs non-incremental BOM sniffing.
- encoding_
for_ ⚠label - Implements the get an encoding algorithm.
- encoding_
for_ ⚠label_ no_ replacement - This function behaves the same as
encoding_for_label()
, except whenencoding_for_label()
would returnREPLACEMENT_ENCODING
, this method returnsNULL
instead. - encoding_
is_ ⚠ascii_ compatible - Checks whether the bytes 0x00…0x7F map exclusively to the characters U+0000…U+007F and vice versa.
- encoding_
is_ ⚠single_ byte - Checks whether this encoding maps one byte to one Basic Multilingual Plane code point (i.e. byte length equals decoded UTF-16 length) and vice versa (for mappable characters).
- encoding_
iso_ ⚠2022_ jp_ ascii_ valid_ up_ to - Validates ISO-2022-JP ASCII-state data.
- encoding_
name ⚠ - Writes the name of the given
Encoding
to a caller-supplied buffer as ASCII and returns the number of bytes / ASCII characters written. - encoding_
new_ ⚠decoder - Allocates a new
Decoder
for the givenEncoding
on the heap with BOM sniffing enabled and returns a pointer to the newly-allocatedDecoder
. - encoding_
new_ ⚠decoder_ into - Allocates a new
Decoder
for the givenEncoding
into memory provided by the caller with BOM sniffing enabled. (In practice, the target should likely be a pointer previously returned byencoding_new_decoder()
.) - encoding_
new_ ⚠decoder_ with_ bom_ removal - Allocates a new
Decoder
for the givenEncoding
on the heap with BOM removal and returns a pointer to the newly-allocatedDecoder
. - encoding_
new_ ⚠decoder_ with_ bom_ removal_ into - Allocates a new
Decoder
for the givenEncoding
into memory provided by the caller with BOM removal. - encoding_
new_ ⚠decoder_ without_ bom_ handling - Allocates a new
Decoder
for the givenEncoding
on the heap with BOM handling disabled and returns a pointer to the newly-allocatedDecoder
. - encoding_
new_ ⚠decoder_ without_ bom_ handling_ into - Allocates a new
Decoder
for the givenEncoding
into memory provided by the caller with BOM handling disabled. - encoding_
new_ ⚠encoder - Allocates a new
Encoder
for the givenEncoding
on the heap and returns a pointer to the newly-allocatedEncoder
. (Exception, if theEncoding
isreplacement
, a newDecoder
for UTF-8 is instantiated (and thatDecoder
reportsUTF_8
as itsEncoding
). - encoding_
new_ ⚠encoder_ into - Allocates a new
Encoder
for the givenEncoding
into memory provided by the caller. (In practice, the target should likely be a pointer previously returned byencoding_new_encoder()
.) - encoding_
output_ ⚠encoding - Returns the output encoding of this encoding. This is UTF-8 for UTF-16BE, UTF-16LE and replacement and the encoding itself otherwise.
- encoding_
utf8_ ⚠valid_ up_ to - Validates UTF-8.