cesu8-str 1.2.1

CESU-8 and Java CESU-8 string validation and manipulation.
Documentation

CESU-8 Encoder & Decoder

Converts between normal UTF-8 and CESU-8 encodings.

CESU-8 encodes characters outside the Basic Multilingual Plane as two UTF-16 surrogate characters, which are then re-encoded as 3-byte UTF-8 characters. This means that 4-byte UTF-8 sequences become 6-byte CESU-8 sequences.

We also support Java's Modified UTF-8 encoding, which uses a variant of CESU-8 encoding \0 using a two-byte sequence.