Struct widestring::ustr::U16Str [−][src]
pub struct U16Str { /* fields omitted */ }
Expand description
16-bit wide string slice with undefined encoding.
U16Str
is to U16String
as OsStr
is to
OsString
.
U16Str
are string slices that do not have a defined encoding. While it is sometimes
assumed that they contain possibly invalid or ill-formed UTF-16 data, they may be used for
any wide encoded string. This is because U16Str
is intended to be used with FFI
functions, where proper encoding cannot be guaranteed. If you need string slices that are
always valid UTF-16 strings, use Utf16Str
instead.
Because U16Str
does not have a defined encoding, no restrictions are placed on mutating
or indexing the slice. This means that even if the string contained properly encoded UTF-16
or other encoding data, mutationing or indexing may result in malformed data. Convert to a
Utf16Str
if retaining proper UTF-16 encoding is desired.
FFI considerations
U16Str
is not aware of nul values and may or may not be nul-terminated. It is intended
to be used with FFI functions that directly use string length, where the strings are known
to have proper nul-termination already, or where strings are merely being passed through
without modification.
U16CStr
should be used instead if nul-aware strings are required.
Examples
The easiest way to use U16Str
outside of FFI is with the u16str!
macro to convert string literals into UTF-16 string slices at compile time:
use widestring::u16str;
let hello = u16str!("Hello, world!");
You can also convert any u16
slice directly:
use widestring::{u16str, U16Str};
let sparkle_heart = [0xd83d, 0xdc96];
let sparkle_heart = U16Str::from_slice(&sparkle_heart);
assert_eq!(u16str!("💖"), sparkle_heart);
// This unpaired UTf-16 surrogate is invalid UTF-16, but is perfectly valid in U16Str
let malformed_utf16 = [0x0, 0xd83d]; // Note that nul values are also valid an untouched
let s = U16Str::from_slice(&malformed_utf16);
assert_eq!(s.len(), 2);
When working with a FFI, it is useful to create a U16Str
from a pointer and a length:
use widestring::{u16str, U16Str};
let sparkle_heart = [0xd83d, 0xdc96];
let sparkle_heart = unsafe {
U16Str::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len())
};
assert_eq!(u16str!("💖"), sparkle_heart);
Implementations
Constructs a wide string slice from a pointer and a length.
The len
argument is the number of elements, not the number of bytes. No
copying or allocation is performed, the resulting value is a direct reference to the
pointer bytes.
Safety
This function is unsafe as there is no guarantee that the given pointer is valid for
len
elements.
In addition, the data must meet the safety conditions of
std::slice::from_raw_parts. In particular, the returned string reference must not
be mutated for the duration of lifetime 'a
, except inside an
UnsafeCell
.
Panics
This function panics if p
is null.
Caveat
The lifetime for the returned string is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the string, or by explicit annotation.
Constructs a mutable wide string slice from a mutable pointer and a length.
The len
argument is the number of elements, not the number of bytes. No
copying or allocation is performed, the resulting value is a direct reference to the
pointer bytes.
Safety
This function is unsafe as there is no guarantee that the given pointer is valid for
len
elements.
In addition, the data must meet the safety conditions of std::slice::from_raw_parts_mut.
Panics
This function panics if p
is null.
Caveat
The lifetime for the returned string is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the string, or by explicit annotation.
Constructs a wide string slice from a slice of character data.
No checks are performed on the slice. It may be of any encoding and may contain invalid or malformed data for that encoding.
Constructs a mutable wide string slice from a mutable slice of character data.
No checks are performed on the slice. It may be of any encoding and may contain invalid or malformed data for that encoding.
Copies the string reference to a new owned wide string.
Converts to a slice of the underlying elements of the string.
Converts to a mutable slice of the underlying elements of the string.
Returns a raw pointer to the string.
The caller must ensure that the string outlives the pointer this function returns, or else it will end up pointing to garbage.
The caller must also ensure that the memory the pointer (non-transitively) points to
is never written to (except inside an UnsafeCell
) using this pointer or any
pointer derived from it. If you need to mutate the contents of the string, use
as_mut_ptr
.
Modifying the container referenced by this string may cause its buffer to be reallocated, which would also make any pointers to it invalid.
Returns an unsafe mutable raw pointer to the string.
The caller must ensure that the string outlives the pointer this function returns, or else it will end up pointing to garbage.
Modifying the container referenced by this string may cause its buffer to be reallocated, which would also make any pointers to it invalid.
Returns the two raw pointers spanning the string slice.
The returned range is half-open, which means that the end pointer points one past the last element of the slice. This way, an empty slice is represented by two equal pointers, and the difference between the two pointers represents the size of the slice.
See as_ptr
for warnings on using these pointers. The end pointer
requires extra caution, as it does not point to a valid element in the slice.
This function is useful for interacting with foreign interfaces which use two pointers to refer to a range of elements in memory, as is common in C++.
Returns the two unsafe mutable pointers spanning the string slice.
The returned range is half-open, which means that the end pointer points one past the last element of the slice. This way, an empty slice is represented by two equal pointers, and the difference between the two pointers represents the size of the slice.
See as_mut_ptr
for warnings on using these pointers. The end
pointer requires extra caution, as it does not point to a valid element in the
slice.
This function is useful for interacting with foreign interfaces which use two pointers to refer to a range of elements in memory, as is common in C++.
Returns the length of the string as number of elements (not number of bytes).
Converts a boxed wide string slice into an owned wide string without copying or allocating.
Returns an object that implements Display
for printing
strings that may contain non-Unicode data.
This method assumes this string is intended to be UTF-16 encoding, but handles
ill-formed UTF-16 sequences lossily. The returned struct implements
the Display
trait in a way that decoding the string is lossy
UTF-16 decoding but no heap allocations are performed, such as by
to_string_lossy
.
By default, invalid Unicode data is replaced with
U+FFFD REPLACEMENT CHARACTER
(�). If you wish
to simply skip any invalid Uncode data and forego the replacement, you may use the
alternate formatting with {:#}
.
Examples
Basic usage:
use widestring::U16Str;
// 𝄞mus<invalid>ic<invalid>
let s = U16Str::from_slice(&[
0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
]);
assert_eq!(format!("{}", s.display()),
"𝄞mus�ic�"
);
Using alternate formatting style to skip invalid values entirely:
use widestring::U16Str;
// 𝄞mus<invalid>ic<invalid>
let s = U16Str::from_slice(&[
0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
]);
assert_eq!(format!("{:#}", s.display()),
"𝄞music"
);
Returns a subslice of the string.
This is the non-panicking alternative to indexing the string. Returns None
whenever equivalent indexing operation would panic.
Returns a mutable subslice of the string.
This is the non-panicking alternative to indexing the string. Returns None
whenever equivalent indexing operation would panic.
pub unsafe fn get_unchecked<I>(&self, i: I) -> &Self where
I: SliceIndex<[u16], Output = [u16]>,
pub unsafe fn get_unchecked<I>(&self, i: I) -> &Self where
I: SliceIndex<[u16], Output = [u16]>,
Returns an unchecked subslice of the string.
This is the unchecked alternative to indexing the string.
Safety
Callers of this function are responsible that these preconditions are satisfied:
- The starting index must not exceed the ending index;
- Indexes must be within bounds of the original slice.
Failing that, the returned string slice may reference invalid memory.
pub unsafe fn get_unchecked_mut<I>(&mut self, i: I) -> &mut Self where
I: SliceIndex<[u16], Output = [u16]>,
pub unsafe fn get_unchecked_mut<I>(&mut self, i: I) -> &mut Self where
I: SliceIndex<[u16], Output = [u16]>,
Returns aa mutable, unchecked subslice of the string.
This is the unchecked alternative to indexing the string.
Safety
Callers of this function are responsible that these preconditions are satisfied:
- The starting index must not exceed the ending index;
- Indexes must be within bounds of the original slice.
Failing that, the returned string slice may reference invalid memory.
Divide one string slice into two at an index.
The argument, mid
, should be an offset from the start of the string.
The two slices returned go from the start of the string slice to mid
, and from
mid
to the end of the string slice.
To get mutable string slices instead, see the split_at_mut
method.
Divide one mutable string slice into two at an index.
The argument, mid
, should be an offset from the start of the string.
The two slices returned go from the start of the string slice to mid
, and from
mid
to the end of the string slice.
To get immutable string slices instead, see the split_at
method.
Decodes a string reference to an owned OsString
.
This makes a string copy of the U16Str
. Since U16Str
makes no guarantees that its
encoding is UTF-16 or that the data valid UTF-16, there is no guarantee that the resulting
OsString
will have a valid underlying encoding either.
Note that the encoding of OsString
is platform-dependent, so on
some platforms this may make an encoding conversions, while on other platforms (such as
windows) no changes to the string will be made.
Examples
use widestring::U16String;
use std::ffi::OsString;
let s = "MyString";
// Create a wide string from the string
let wstr = U16String::from_str(s);
// Create an OsString from the wide string
let osstr = wstr.to_os_string();
assert_eq!(osstr, OsString::from(s));
Decodes this string to a String
if it contains valid UTF-16 data.
This method assumes this string is encoded as UTF-16 and attempts to decode it as such.
Failures
Returns an error if the string contains any invalid UTF-16 data.
Examples
use widestring::U16String;
let s = "MyString";
// Create a wide string from the string
let wstr = U16String::from_str(s);
// Create a regular string from the wide string
let s2 = wstr.to_string().unwrap();
assert_eq!(s2, s);
Decodes the string to a String
even if it is invalid UTF-16 data.
This method assumes this string is encoded as UTF-16 and attempts to decode it as such. Any
invalid sequences are replaced with
U+FFFD REPLACEMENT CHARACTER
, which looks like this:
�
Examples
use widestring::U16String;
let s = "MyString";
// Create a wide string from the string
let wstr = U16String::from_str(s);
// Create a regular string from the wide string
let lossy = wstr.to_string_lossy();
assert_eq!(lossy, s);
pub fn chars(&self) -> CharsUtf16<'_>ⓘNotable traits for CharsUtf16<'a>impl<'a> Iterator for CharsUtf16<'a> type Item = Result<char, DecodeUtf16Error>;
pub fn chars(&self) -> CharsUtf16<'_>ⓘNotable traits for CharsUtf16<'a>impl<'a> Iterator for CharsUtf16<'a> type Item = Result<char, DecodeUtf16Error>;
impl<'a> Iterator for CharsUtf16<'a> type Item = Result<char, DecodeUtf16Error>;
Returns an iterator over the char
s of a string slice.
As this string has no defined encoding, this method assumes the string is UTF-16. Since it
may consist of invalid UTF-16, the iterator returned by this method
is an iterator over Result<char, DecodeUtf16Error>
instead of char
s
directly. If you would like a lossy iterator over chars
s directly, instead
use chars_lossy
.
It’s important to remember that char
represents a Unicode Scalar Value, and
may not match your idea of what a ‘character’ is. Iteration over grapheme clusters may be
what you actually want. That functionality is not provided by by this crate.
pub fn chars_lossy(&self) -> CharsLossyUtf16<'_>ⓘNotable traits for CharsLossyUtf16<'a>impl<'a> Iterator for CharsLossyUtf16<'a> type Item = char;
pub fn chars_lossy(&self) -> CharsLossyUtf16<'_>ⓘNotable traits for CharsLossyUtf16<'a>impl<'a> Iterator for CharsLossyUtf16<'a> type Item = char;
impl<'a> Iterator for CharsLossyUtf16<'a> type Item = char;
Returns a lossy iterator over the char
s of a string slice.
As this string has no defined encoding, this method assumes the string is UTF-16. Since it
may consist of invalid UTF-16, the iterator returned by this method will replace unpaired
surrogates with
U+FFFD REPLACEMENT CHARACTER
(�). This is a lossy
version of chars
.
It’s important to remember that char
represents a Unicode Scalar Value, and
may not match your idea of what a ‘character’ is. Iteration over grapheme clusters may be
what you actually want. That functionality is not provided by by this crate.
pub fn char_indices(&self) -> CharIndicesUtf16<'_>ⓘNotable traits for CharIndicesUtf16<'a>impl<'a> Iterator for CharIndicesUtf16<'a> type Item = (usize, Result<char, DecodeUtf16Error>);
pub fn char_indices(&self) -> CharIndicesUtf16<'_>ⓘNotable traits for CharIndicesUtf16<'a>impl<'a> Iterator for CharIndicesUtf16<'a> type Item = (usize, Result<char, DecodeUtf16Error>);
impl<'a> Iterator for CharIndicesUtf16<'a> type Item = (usize, Result<char, DecodeUtf16Error>);
Returns an iterator over the chars of a string slice, and their positions.
As this string has no defined encoding, this method assumes the string is UTF-16. Since it
may consist of invalid UTF-16, the iterator returned by this method is an iterator over
Result<char, DecodeUtf16Error>
as well as their positions, instead of
char
s directly. If you would like a lossy indices iterator over
chars
s directly, instead use
char_indices_lossy
.
The iterator yields tuples. The position is first, the char
is second.
pub fn char_indices_lossy(&self) -> CharIndicesLossyUtf16<'_>ⓘNotable traits for CharIndicesLossyUtf16<'a>impl<'a> Iterator for CharIndicesLossyUtf16<'a> type Item = (usize, char);
pub fn char_indices_lossy(&self) -> CharIndicesLossyUtf16<'_>ⓘNotable traits for CharIndicesLossyUtf16<'a>impl<'a> Iterator for CharIndicesLossyUtf16<'a> type Item = (usize, char);
impl<'a> Iterator for CharIndicesLossyUtf16<'a> type Item = (usize, char);
Returns a lossy iterator over the chars of a string slice, and their positions.
As this string slice may consist of invalid UTF-16, the iterator returned by this method
will replace unpaired surrogates with
U+FFFD REPLACEMENT CHARACTER
(�), as well as the
positions of all characters. This is a lossy version of
char_indices
.
The iterator yields tuples. The position is first, the char
is second.
Trait Implementations
Performs the +=
operation. Read more
Mutably borrows from an owned value. Read more
Extends a collection with the contents of an iterator. Read more
extend_one
)Extends a collection with exactly one element.
extend_one
)Reserves capacity in a collection for the given number of additional elements. Read more
Creates a value from an iterator. Read more
This method returns an ordering between self
and other
values if one exists. Read more
This method tests less than (for self
and other
) and is used by the <
operator. Read more
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
This method tests greater than (for self
and other
) and is used by the >
operator. Read more
This method returns an ordering between self
and other
values if one exists. Read more
This method tests less than (for self
and other
) and is used by the <
operator. Read more
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
This method tests greater than (for self
and other
) and is used by the >
operator. Read more
This method returns an ordering between self
and other
values if one exists. Read more
This method tests less than (for self
and other
) and is used by the <
operator. Read more
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
This method tests greater than (for self
and other
) and is used by the >
operator. Read more
This method returns an ordering between self
and other
values if one exists. Read more
This method tests less than (for self
and other
) and is used by the <
operator. Read more
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
This method tests greater than (for self
and other
) and is used by the >
operator. Read more
This method returns an ordering between self
and other
values if one exists. Read more
This method tests less than (for self
and other
) and is used by the <
operator. Read more
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
This method tests greater than (for self
and other
) and is used by the >
operator. Read more
This method returns an ordering between self
and other
values if one exists. Read more
This method tests less than (for self
and other
) and is used by the <
operator. Read more
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
This method tests greater than (for self
and other
) and is used by the >
operator. Read more
This method returns an ordering between self
and other
values if one exists. Read more
This method tests less than (for self
and other
) and is used by the <
operator. Read more
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
This method tests greater than (for self
and other
) and is used by the >
operator. Read more