Translate bytes from an mmap’d byte slice — zero syscall reads.
Uses SIMD AVX2 for range-delta patterns (e.g., a-z → A-Z).
For large inputs, translates in parallel using rayon for maximum throughput.
Translate + squeeze from mmap’d byte slice.
Uses a two-pass approach for the common case where few bytes are squeezed:
translate first (using SIMD when possible), then squeeze.