Expand description
A UTF-16 little-endian string type.
This crate provides two string types to handle UTF-16 encoded bytes directly as strings:
WString
and WStr
. They are to UTF-16 exactly like String
and str
are to
UTF-8. Some of the concepts and functions here are rather tersely documented, in this
case you can look up their equivalents on String
or str
and the behaviour should
be exactly the same, only the underlying byte encoding is different.
Thus WString
is a type which owns the bytes containing the string. Just like
String
and the underlying Vec
it is built on, it distinguishes length
(WString::len
) and capacity (String::capacity
). Here length is the number of
bytes used while capacity is the number of bytes the string can grow withouth
reallocating.
The WStr
type does not own any bytes, it can only point to a slice of bytes
containing valid UTF-16. As such you will only ever use it as a reference like &WStr
,
just you you only use str
as &str
.
The WString
type implements Deref<Target = WStr<ByteOrder>
§UTF-16 ByteOrder
UTF-16 encodes to unsigned 16-bit integers (u16
), denoting code units. However
different CPU architectures encode these u16
integers using different byte order:
little-endian and big-endian. Thus when handling UTF-16 strings you need to be
aware of the byte order of the encoding, commonly the encoding variants are know as
UTF-16LE and UTF-16BE respectively.
For this crate this means the types need to be aware of the byte order, which is done
using the byteorder::ByteOrder
trait as a generic parameter to the types:
WString<ByteOrder>
and WStr<ByteOrder>
commonly written as WString<E>
and
WStr<E>
where E
stands for “endianess”.
This crate exports BigEndian
, BE
, LittleEndian
and LE
in case you need
to denote the type:
use utf16string::{BigEndian, BE, WString};
let s0: WString<BigEndian> = WString::from("hello");
assert_eq!(s0.len(), 10);
let s1: WString<BE> = WString::from("hello");
assert_eq!(s0, s1);
As these types can often be a bit cumbersome to write they can often be inferred,
especially with the help of the shorthand constructors like WString::from_utf16le
,
WString::from_utf16be
, WStr::from_utf16le
, WStr::from_utf16be
and related.
For example:
use utf16string::{LE, WStr};
let b = b"h\x00e\x00l\x00l\x00o\x00";
let s0: &WStr<LE> = WStr::from_utf16(b)?;
let s1 = WStr::from_utf16le(b)?;
assert_eq!(s0, s1);
assert_eq!(s0.to_utf8(), "hello");
Structs§
- Utf16
Error - Error for invalid UTF-16 encoded bytes.
- WStr
- A UTF-16
str
-like type with little- or big-endian byte order. - WStr
Char Indices - Iterator yielding
(index, char)
tuples from a UTF-16 little-endian encoded byte slice. - WStr
Chars - Iterator yielding
char
from a UTF-16 encoded byte slice. - WString
- A UTF-16
String
-like type with little- or big-endian byte order.
Enums§
- BigEndian
- Defines big-endian serialization.
- Little
Endian - Defines little-endian serialization.
Traits§
- Slice
Index - Our own version of
std::slice::SliceIndex
.
Type Aliases§
- BE
- A type alias for
BigEndian
. - LE
- A type alias for
LittleEndian
.