Stream nl output directly to a file descriptor in batched writes.
This is the preferred entry point for the binary — avoids building the entire
output in memory and instead flushes ~1MB chunks. For large files this
dramatically reduces memory usage and write() syscall overhead.