In-place field 1 extraction: modifies data buffer directly, returns new length.
Output is always <= input (we remove everything after first delimiter per line).
Avoids intermediate Vec allocation + BufWriter copy, saving ~10MB of memory
bandwidth for 10MB input. Requires owned mutable data (not mmap).
Process input from a reader (for stdin).
Uses batch reading: reads large chunks (16MB), then processes them in batch
using the fast mmap-based paths, avoiding per-line read_until syscall overhead.
16MB chunks mean a 10MB piped input is consumed in a single batch.