Crate cesu8

Expand description

A library for converting between CESU-8 and UTF-8.

Unicode code points from the Basic Multilingual Plane (BMP), i.e. a code point in the range U+0000 to U+FFFF is encoded in the same way as UTF-8.

If encode or decode only encounters data that is both valid CESU-8 and UTF-8 data, the cesu8 crate leverages this using a clone-on-write smart pointer (Cow). This means that there are no unnecessary operations and needless allocation of memory:

Examples

Basic usage:

use alloc::borrow::Cow;

let str = "Hello, world!";
assert_eq!(cesu8::encode(str), Cow::Borrowed(str.as_bytes()));
assert_eq!(cesu8::decode(str.as_bytes())?, Cow::Borrowed(str));

When data needs to be encoded or decoded, it functions as one might expect:

let str = "\u{10400}";
let cesu8_data = &[0xED, 0xA0, 0x81, 0xED, 0xB0, 0x80];
assert_eq!(cesu8::decode(cesu8_data)?, Cow::<str>::Owned(str.to_string()));

Features

std implements std::error::Error on Error. By default this feature is enabled.

Structs

Error

An error thrown by decode when the input is invalid CESU-8 data.

Functions

decode

Converts a slice of bytes to a string slice.

encode

Converts a string slice to CESU-8 bytes.

is_valid

Returns true if a string slice contains UTF-8 data that is also valid CESU-8.

len

Returns how many bytes in CESU-8 are required to encode a string slice.