Skip to main content

count_chars_utf8

Function count_chars_utf8 

Source
pub fn count_chars_utf8(data: &[u8]) -> u64
Expand description

Count UTF-8 characters by counting non-continuation bytes. A continuation byte has the bit pattern 10xxxxxx (0x80..0xBF). Every other byte starts a new character (ASCII, multi-byte leader, or invalid).

Uses 64-byte block processing with popcount for ~4x throughput vs scalar.