Skip to main content

count_chars_utf8

Function count_chars_utf8 

Source
pub fn count_chars_utf8(data: &[u8]) -> u64
Expand description

Count UTF-8 characters by counting non-continuation bytes. A continuation byte has the bit pattern 10xxxxxx (0x80..0xBF). Every other byte starts a new character (ASCII, multi-byte leader, or invalid).

Uses AVX2 SIMD on x86_64 for ~32 bytes per cycle throughput. Falls back to 64-byte block processing with popcount on other architectures.