| The Conv2D operator computes a 2D convolution
| operation over an input blob $(X)$,
| with a filter blob $(filter)$ and a bias
| blob $(bias)$, and outputs a single
| output blob $(Y)$.
|
| Although there are several options
| for order, the convention is that the
| input $(X)$ is a blob of shape $(N,C_{in},H_{in},W_{in})$
| and the output $(Y)$ is a blob of shape
| $(N,C_{out},H_{out},W_{out})$.
| Here, $N$ is the batch size, $C$ is the
| number of channels, $H$ is the spatial
| height, and $W$ is the spatial width.
| For example, if your input data was a
| batch of five, 100x120pixel RGB images,
| $X$ would have shape $(5,3,120,100)$.
|
| The $filter$ input blob may contain
| multiple filters and has shape $(M,
| C_{in}, K_H, K_W)$.
|
| Here, $M$ is the number of individual
| filters contained in the blob, $C_{in}$
| is the number of channels of each filter
| (by convention in 2D convolution it
| is the same as the number of channels
| in the input), $K_H$ is the spatial height
| of the kernel, and $K_W$ is the spatial
| width of the kernel.
|
| The $bias$ blob is a vector of length
| $M$, where there is one bias for each
| filter in the $filter$ blob.
|
| Given the shape of the input blob and
| the filter blob, we can calculate the
| shape of the output blob as follows.
| The number of items in the batch $N$ will
| stay the same. The number of channels
| in the output will equal the number of
| kernels in the filter blob, so $C_{out}
| = M.$ With stride and pad defined below,
| the spatial height and width of the output
| ($H_{out}$ and $W_{out}$) are calculated
| as
|
| $$H_{out} = \left \lfloor{\frac{H_{in}
| - K_H + 2pad}{stride}+1}\right \rfloor$$
|
| $$W_{out} = \left \lfloor{\frac{W_{in}
| - K_W + 2pad}{stride}+1}\right \rfloor$$
|
| Github Links:
|
| - https://github.com/pytorch/pytorch/blob/master/caffe2/operators/conv_op.h
|
| - https://github.com/pytorch/pytorch/blob/master/caffe2/operators/conv_op.cc
|
| - https://github.com/pytorch/pytorch/blob/master/caffe2/operators/conv_pool_op_base.h
|
| The ConvTranspose op takes an input data tensor
| $X$, an input weight tensor $filter$, and
| optionally an input bias tensor $bias$.
|
| It then computes the transposed convolution,
| sometimes referred to as deconvolution, and
| produces a single output tensor $Y$. The
| hyperparameters of the op such as kernel size,
| stride, and padding are specified as args.
|
| At each stride, the filter is deconvolved with
| a subset of $X$ and the $bias$ is added. This is
| done throughout the input data until the output
| computation is complete.
|
| The output shapes are computed as follows. The
| number of channels in the output feature map is
| the number of kernels specified in the filter
| blob.
|
| The spatial height and width are computed as
|
| $$H_{out} = (H_{in}-1)strides[0] - 2pads[0] + kernels[0]$$
|
|
| $$W_{out} = (W_{in}-1)strides[1] - 2pads[1] + kernels[1]$$
|
| Note on the implementation layout:
| conv_transpose_op_impl.h is the templated
| implementation of the conv_transpose_op.h file,
| which is why they are separate files. Also, in the
| implementation this operator inherits from the
| ConvTransposeUnpoolOpBase operator.
|
| Github Links:
| - https://github.com/pytorch/pytorch/tree/master/caffe2/operators/conv_transpose_op.h
| - https://github.com/pytorch/pytorch/tree/master/caffe2/operators/conv_transpose_op.cc
| - https://github.com/pytorch/pytorch/tree/master/caffe2/operators/conv_transpose_unpool_op_base.h
|