[][src]Crate utf16string

A UTF-16 little-endian string type.

This crate provides two string types to handle UTF-16 encoded bytes directly as strings: WString and WStr. They are to UTF-16 exactly like String and str are to UTF-8. Some of the concepts and functions here are rather tersely documented, in this case you can look up their equivalents on String or str and the behaviour should be exactly the same, only the underlying byte encoding is different.

Thus WString is a type which owns the bytes containing the string. Just like String and the underlying Vec it is built on, it distinguishes length (WString::len) and capacity (String::capacity). Here length is the number of bytes used while capacity is the number of bytes the string can grow withouth reallocating.

The WStr type does not own any bytes, it can only point to a slice of bytes containing valid UTF-16. As such you will only ever use it as a reference like &WStr, just you you only use str as &str.

The WString type implements Deref<Target = WStr<ByteOrder>

UTF-16 ByteOrder

UTF-16 encodes to unsigned 16-bit integers (u16), denoting code units. However different CPU architectures encode these u16 integers using different byte order: little-endian and big-endian. Thus when handling UTF-16 strings you need to be aware of the byte order of the encoding, commonly the encoding variants are know as UTF-16LE and UTF-16BE respectively.

For this crate this means the types need to be aware of the byte order, which is done using the byteorder::ByteOrder trait as a generic parameter to the types: WString<ByteOrder> and WStr<ByteOrder> commonly written as WString<E> and WStr<E> where E stands for "endianess".

This crate exports BigEndian, BE, LittleEndian and LE in case you need to denote the type:

use utf16string::{BigEndian, BE, WString};

let s0: WString<BigEndian> = WString::from("hello");
assert_eq!(s0.len(), 10);

let s1: WString<BE> = WString::from("hello");
assert_eq!(s0, s1);

As these types can often be a bit cumbersome to write they can often be inferred, especially with the help of the shorthand constructors like WString::from_utf16le, WString::from_utf16be, WStr::from_utf16le, WStr::from_utf16be and related. For example:

use utf16string::{LE, WStr};

let b = b"h\x00e\x00l\x00l\x00o\x00";

let s0: &WStr<LE> = WStr::from_utf16(b)?;
let s1 = WStr::from_utf16le(b)?;

assert_eq!(s0, s1);
assert_eq!(s0.to_utf8(), "hello");



Error for invalid UTF-16 encoded bytes.


A UTF-16 str-like type with little- or big-endian byte order.


Iterator yielding (index, char) tuples from a UTF-16 little-endian encoded byte slice.


Iterator yielding char from a UTF-16 encoded byte slice.


A UTF-16 String-like type with little- or big-endian byte order.





Our own version of std::slice::SliceIndex.

Type Definitions