Crate char_ranges

Source
Expand description

Similar to the standard library’s .char_indicies(), but instead of only producing the start byte position. This library implements .char_ranges(), that produce both the start and end byte positions.

Note that simply using .char_indicies() and creating a range by mapping the returned index i to i..(i + 1) is not guaranteed to be valid. Given that some UTF-8 characters can be up to 4 bytes.

CharBytesRange
'O'10..1
'Ø'20..2
'∈'30..3
'🌏'40..4

Assumes encoded in UTF-8.

The implementation specializes last(), nth(), next_back(), and nth_back(). Such that the length of intermediate characters is not wastefully calculated.

Β§Example

use char_ranges::CharRangesExt;

let text = "Hello πŸ—»βˆˆπŸŒ";

let mut chars = text.char_ranges();
assert_eq!(chars.as_str(), "Hello πŸ—»βˆˆπŸŒ");

assert_eq!(chars.next(), Some((0..1, 'H'))); // These chars are 1 byte
assert_eq!(chars.next(), Some((1..2, 'e')));
assert_eq!(chars.next(), Some((2..3, 'l')));
assert_eq!(chars.next(), Some((3..4, 'l')));
assert_eq!(chars.next(), Some((4..5, 'o')));
assert_eq!(chars.next(), Some((5..6, ' ')));

// Get the remaining substring
assert_eq!(chars.as_str(), "πŸ—»βˆˆπŸŒ");

assert_eq!(chars.next(), Some((6..10, 'πŸ—»'))); // This char is 4 bytes
assert_eq!(chars.next(), Some((10..13, '∈'))); // This char is 3 bytes
assert_eq!(chars.next(), Some((13..17, '🌏'))); // This char is 4 bytes
assert_eq!(chars.next(), None);

Β§DoubleEndedIterator

CharRanges also implements DoubleEndedIterator making it possible to iterate backwards.

use char_ranges::CharRangesExt;

let text = "ABCDE";

let mut chars = text.char_ranges();
assert_eq!(chars.as_str(), "ABCDE");

assert_eq!(chars.next(), Some((0..1, 'A')));
assert_eq!(chars.next_back(), Some((4..5, 'E')));
assert_eq!(chars.as_str(), "BCD");

assert_eq!(chars.next_back(), Some((3..4, 'D')));
assert_eq!(chars.next(), Some((1..2, 'B')));
assert_eq!(chars.as_str(), "C");

assert_eq!(chars.next(), Some((2..3, 'C')));
assert_eq!(chars.as_str(), "");

assert_eq!(chars.next(), None);

Β§Offset Ranges

If the input text is a substring of some original text, and the produced ranges are desired to be offset in relation to the substring. Then instead of .char_ranges() use .char_ranges_offset(offset) or .char_ranges().offset(offset).

use char_ranges::CharRangesExt;

let text = "Hello πŸ‘‹ World 🌏";

let start = 11; // Start index of 'W'
let text = &text[start..]; // "World 🌏"

let mut chars = text.char_ranges_offset(start);
// or
// let mut chars = text.char_ranges().offset(start);

assert_eq!(chars.next(), Some((11..12, 'W'))); // These chars are 1 byte
assert_eq!(chars.next(), Some((12..13, 'o')));
assert_eq!(chars.next(), Some((13..14, 'r')));

assert_eq!(chars.next_back(), Some((17..21, '🌏'))); // This char is 4 bytes

StructsΒ§

CharRanges
An iterator over chars and their start and end byte positions.
CharRangesOffset
An iterator over chars and their start and end byte positions, with an offset applied to all positions.

TraitsΒ§

CharRangesExt