pub fn process_chunks(len: usize, lanes: usize) -> (usize, usize)
Process a slice in SIMD-width chunks, calling a function on each chunk and handling the scalar remainder.
Returns the index where the remainder begins.