Expand description

A library for converting between CESU-8 and UTF-8.

Unicode code points from the Basic Multilingual Plane (BMP), i.e. a code point in the range U+0000 to U+FFFF is encoded in the same way as UTF-8.

If encode or decode only encounters data that is both valid CESU-8 and UTF-8 data, the cesu8 crate leverages this using a clone-on-write smart pointer (Cow). This means that there are no unnecessary operations and needless allocation of memory:

Examples

Basic usage:

use alloc::borrow::Cow;

let str = "Hello, world!";
assert_eq!(cesu8::encode(str), Cow::Borrowed(str.as_bytes()));
assert_eq!(cesu8::decode(str.as_bytes())?, Cow::Borrowed(str));

When data needs to be encoded or decoded, it functions as one might expect:

let str = "\u{10400}";
let cesu8_data = &[0xED, 0xA0, 0x81, 0xED, 0xB0, 0x80];
assert_eq!(cesu8::decode(cesu8_data)?, Cow::<str>::Owned(str.to_string()));

Features

Structs

An error thrown by decode when the input is invalid CESU-8 data.

Functions

Converts a slice of bytes to a string slice.
Converts a string slice to CESU-8 bytes.
Returns true if a string slice contains UTF-8 data that is also valid CESU-8.
Returns how many bytes in CESU-8 are required to encode a string slice.