Expand description

A library for converting between MUTF-8 and UTF-8.

MUTF-8 is the same as CESU-8 except for its handling of embedded null characters. This library builds on top of the residua-cesu8 crate found here.

Examples

Basic usage

use alloc::borrow::Cow;

let str = "Hello, world!";
// 16-bit Unicode characters are the same in UTF-8 and MUTF-8:
assert_eq!(mutf8::encode(str), Cow::Borrowed(str.as_bytes()));
assert_eq!(mutf8::decode(str.as_bytes()), Ok(Cow::Borrowed(str)));

let str = "\u{10401}";
let mutf8_data = &[0xED, 0xA0, 0x81, 0xED, 0xB0, 0x81];
// 'mutf8_data' is a byte slice containing a 6-byte surrogate pair which
// becomes a 4-byte UTF-8 character.
assert_eq!(mutf8::decode(mutf8_data), Ok(Cow::Owned(str.to_string())));

let str = "\0";
let mutf8_data = vec![0xC0, 0x80];
// 'str' is a null character which becomes a two-byte MUTF-8 representation.
assert_eq!(mutf8::encode(str), Cow::<[u8]>::Owned(mutf8_data));

Features

  • std implements std::error::Error on Error. By default, this feature is enabled.

Structs

An error thrown by decode when the input is invalid MUTF-8 data.

Functions

Converts a slice of bytes to a string slice.
Converts a string slice to MUTF-8 bytes.
Returns true if a string slice contains UTF-8 data that is also valid MUTF-8. This is mainly used in testing if a string slice needs to be explicitly encoded using encode.
Given a string slice, this function returns how many bytes in MUTF-8 are required to encode the string slice.