Expand description
A library implementing the CESU-8 compatibility encoding scheme. This is a non-standard variant of UTF-8 that is used internally by some systems that need to represent UTF-16 data as 8-bit characters.
The use of this encoding is discouraged by the Unicode Consortium. It’s OK for working with existing APIs, but it should not be used for data trasmission or storage.
§Java and U+0000
Java uses the CESU-8 encoding as described above, but with one difference:
the null character U+0000 is represented as an overlong UTF-8 sequence C0 80. This is supported by JavaStr and JavaString.
§Surrogate pairs and UTF-8
The UTF-16 encoding uses “surrogate pairs” to represent Unicode code points in the range from U+10000 to U+10FFFF. These are 16-bit numbers in the range 0xD800 to 0xDFFF.
CESU-8 encodes these surrogate pairs as a 6-byte seqence consisting of two sets of three bytes.
§Crate features
Alloc - Enables all allocation related features. This will allow usage
of Cesu8String and JavaString, which offer a similiar API to the
standard library’s String.
Modules§
Macros§
- cesu8_
str - Builds a
CesuStrliteral at compile time from a string literal. - java_
str - Builds a
JavaStrliteral at compile time from a string literal.
Structs§
- Encoding
Error - Errors which can occur when attempting to interpret a sequence of
u8as a string. - From
VecError - A possible error value when converting a
JavaStringfrom a CESU-8 byte vector.