pub fn count_lines_parallel(data: &[u8]) -> u64
Count newlines in parallel using SIMD memchr + rayon. Each thread gets at least 1MB (to amortize rayon scheduling overhead).