Stream-decode from a reader to a writer. Used for stdin processing.
Fused single-pass: read chunk -> strip whitespace -> decode immediately.
Uses 16MB read buffer to reduce syscalls and memchr2-based SIMD whitespace
stripping for the common case (only \n and \r whitespace in base64 streams).
Decode base64 data and write to output (borrows data, allocates clean buffer).
When ignore_garbage is true, strip all non-base64 characters.
When false, only strip whitespace (standard behavior).
Stream-encode from a reader to a writer. Used for stdin processing.
Uses 3MB read chunks (aligned to 3 bytes for padding-free intermediate encoding).
3MB is optimal for piped input: large enough for good throughput, small enough
that read_full() fills the buffer quickly from pipes (3 reads at 1MB pipe size).