Parallel counting of all metrics at once.
Splits at newline boundaries for safe parallel word + max_line_length counting.
Each chunk computes all metrics in a single traversal group, maximizing cache reuse.
Count UTF-8 characters by counting non-continuation bytes.
A continuation byte has the bit pattern 10xxxxxx (0x80..0xBF).
Every other byte starts a new character (ASCII, multi-byte leader, or invalid).
Count lines and words using optimized strategies per locale.
UTF-8: fused single-pass for lines+words to avoid extra data traversal.
C locale: AVX2 SIMD fused counter when available, scalar fallback otherwise.
Count lines + words + bytes in a single fused pass (the default wc mode).
Avoids separate passes entirely — combines newline counting with word detection.
Parallel counting of lines + words + bytes only (no chars).
Optimized for the default wc mode: avoids unnecessary char-counting pass.
C locale: single fused pass per chunk counts BOTH lines and words.
UTF-8: checks ASCII first for C locale fast path, else splits at newlines
for safe parallel UTF-8 word counting.
Combined parallel counting of lines + words + chars.
UTF-8: splits at newline boundaries for fused lines+words+chars per chunk.
C locale: fused parallel lines+words with boundary adjustment + parallel chars.
Parallel max line length computation.
Splits at newline boundaries so each chunk independently computes correct
max line width (since newlines reset position tracking).