 Applies spatial batch normalization
 to the input tensor as described in the
 original paper, [Batch

 Normalization: Accelerating Deep
 Network Training by Reducing Internal
 Covariate Shift]

 (https://arxiv.org/abs/1502.03167).

 Be aware, this operator has two different
 output sets, depending on the value
 of is_test*. According to the paper,
 the primary operation of spatial batch
 normalization is:

 $$Y = \frac{X  \mu_x}{\sqrt{\sigma^2_{x}
 + \epsilon}}*\gamma + b$$

 In the equation, $\mu_x$ is the mean,
 $X$ is the input data, $\sigma^2_{x}$
 is the var, $\epsilon$ is epsilon,
 $\gamma$ is the scale, $b$ is the bias,
 and $Y$ is the output data.

 The momentum arg also affects this
 calculation in the computation of the
 running mean and variance.

 The influence of momentum is as follows:

 $$running_mean = running_mean *
 momentum + mean (1  momentum)$$

 $$running_var = running_var * momentum
 + var (1  momentum)$$

 Output when is_test = 0 (train mode):
 Y, mean, var, saved_mean, saved_var

 Output when is_test = 1 (test mode):
 Y

 Github Links:

  https://github.com/pytorch/pytorch/blob/master/caffe2/operators/spatial_batch_norm_op.cc

  https://github.com/pytorch/pytorch/blob/master/caffe2/operators/spatial_batch_norm_op.h
