| Coalesce the N inputs into N outputs
| and a single coalesced output blob.
|
| This allows operations that operate
| over multiple small kernels (e.g. biases
| in a deep CNN) to be coalesced into a single
| larger operation, amortizing the kernel
| launch overhead, synchronization
| costs for distributed computation,
| etc.
|
| The operator:
|
| - computes the total size of the coalesced
| blob by summing the input sizes
|
| - allocates the coalesced output blob
| as the total size
|
| - copies the input vectors into the coalesced
| blob, at the correct offset.
|
| - aliases each Output(i) to- point into
| the coalesced blob, at the corresponding
| offset for Input(i).
|
| This is ‘unsafe’ as the output vectors
| are aliased, so use with caution.
|