Translate bytes from an mmap’d byte slice — zero syscall reads.
Uses SIMD AVX2 for range-delta patterns (e.g., a-z → A-Z).
For large inputs, translates in parallel using rayon for maximum throughput.
Translate + squeeze from mmap’d byte slice.
Single buffer: translate into buffer, then squeeze in-place (wp <= i always holds).
Eliminates second buffer allocation and reduces memory traffic.