char-ranges
Similar to the standard library's .char_indicies(), but instead of only
producing the start byte position. This library implements .char_ranges(),
that produce both the start and end byte positions.
Note that simply using .char_indicies() and creating a range by mapping the
returned index i to i..(i + 1) is not guaranteed to be valid. Given that
some UTF-8 characters can be up to 4 bytes.
| Char | Bytes | Range |
|---|---|---|
'O' |
1 | 0..1 |
'Ø' |
2 | 0..2 |
'∈' |
3 | 0..3 |
'🌏' |
4 | 0..4 |
Assumes encoded in UTF-8.
The implementation specializes last(), nth(), next_back(),
and nth_back(). Such that the length of intermediate characters is
not wastefully calculated.
Example
use CharRangesExt;
let text = "Hello 🗻∈🌏";
let mut chars = text.char_ranges;
assert_eq!;
assert_eq!; // These chars are 1 byte
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
// Get the remaining substring
assert_eq!;
assert_eq!; // This char is 4 bytes
assert_eq!; // This char is 3 bytes
assert_eq!; // This char is 4 bytes
assert_eq!;
DoubleEndedIterator
CharRanges also implements DoubleEndedIterator making it possible to iterate backwards.
use CharRangesExt;
let text = "ABCDE";
let mut chars = text.char_ranges;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
Offset Ranges
If the input text is a substring of some original text, and the produced
ranges are desired to be offset in relation to the substring. Then instead
of .char_ranges() use .char_ranges_offset(offset)
or .char_ranges().offset(offset).
use CharRangesExt;
let text = "Hello 👋 World 🌏";
let start = 11; // Start index of 'W'
let text = &text; // "World 🌏"
let mut chars = text.char_ranges_offset;
// or
// let mut chars = text.char_ranges().offset(start);
assert_eq!; // These chars are 1 byte
assert_eq!;
assert_eq!;
assert_eq!; // This char is 4 bytes