Maximum data size for single-allocation translate approach.
Translate bytes from an mmap’d byte slice.
Detects single-range translations (e.g., a-z to A-Z) and uses SIMD vectorized
arithmetic (AVX2: 32 bytes/iter, SSE2: 16 bytes/iter) for those cases.
Falls back to scalar 256-byte table lookup for general translations.
Translate bytes in-place on a mutable buffer (e.g., MAP_PRIVATE mmap).
Eliminates the output buffer allocation entirely — the kernel’s COW
semantics mean only modified pages are physically copied.
Translate from a read-only mmap (or any byte slice) to a separate output buffer.
Avoids MAP_PRIVATE COW page faults by reading from the original data and
writing to a freshly allocated heap buffer.
Translate bytes in-place on an owned buffer, then write.
For piped stdin where we own the data, this avoids the separate output buffer
allocation needed by translate_mmap. Uses parallel in-place SIMD for large data.