Crate cesu8_str

Crate cesu8_str 

Source
Expand description

A library implementing the CESU-8 compatibility encoding scheme. This is a non-standard variant of UTF-8 that is used internally by some systems that need to represent UTF-16 data as 8-bit characters.

The use of this encoding is discouraged by the Unicode Consortium. It’s OK for working with existing APIs, but it should not be used for data trasmission or storage.

§Java and U+0000

Java uses the CESU-8 encoding as described above, but with one difference: the null character U+0000 is represented as an overlong UTF-8 sequence C0 80. This is supported by JavaStr and JavaString.

§Surrogate pairs and UTF-8

The UTF-16 encoding uses “surrogate pairs” to represent Unicode code points in the range from U+10000 to U+10FFFF. These are 16-bit numbers in the range 0xD800 to 0xDFFF.

CESU-8 encodes these surrogate pairs as a 6-byte seqence consisting of two sets of three bytes.

§Crate features

Alloc - Enables all allocation related features. This will allow usage of Cesu8String and JavaString, which offer a similiar API to the standard library’s String.

Modules§

cesu8
java

Macros§

cesu8_str
Builds a CesuStr literal at compile time from a string literal.
java_str
Builds a JavaStr literal at compile time from a string literal.

Structs§

EncodingError
Errors which can occur when attempting to interpret a sequence of u8 as a string.
FromVecError
A possible error value when converting a JavaString from a CESU-8 byte vector.