Crate cesu8[][src]

Expand description

A library for converting between CESU-8 and UTF-8.

Unicode code points from the Basic Multilingual Plane (BMP), i.e. a code point in the range U+0000 to U+FFFF is encoded in the same way as UTF-8.

If from_cesu8 or to_cesu8 only encounters data that is both valid CESU-8 and UTF-8 data, the cesu8 crate leverages this using a clone-on-write smart pointer (Cow). This means that there are no unnecessary operations and needless allocation of memory:

Examples

use std::borrow::Cow;
use cesu8::{from_cesu8, to_cesu8};

let str = "Hello, world!";
assert_eq!(to_cesu8(str), Cow::Borrowed(str.as_bytes()));
assert_eq!(from_cesu8(str.as_bytes()), Cow::Borrowed(str));

When data needs to be encoded or decoded, it functions as one might expect:


let str = "\u{10400}";
let cesu8_data = &[0xED, 0xA0, 0x81, 0xED, 0xB0, 0x80];
assert_eq!(from_cesu8(cesu8_data), Cow::Borrowed(str));

Functions

cesu8_len

Returns how many bytes in CESU-8 are required to encode a string slice.

from_cesu8

Converts a slice of bytes to a string slice.

is_valid_cesu8

Returns true if a string slice contains UTF-8 data that is also valid CESU-8.

to_cesu8

Converts a string slice to CESU-8 bytes.