| This operator computes the cross entropy
| between a $NxD$ dimensional input data
| tensor $X$ and a $NxD$ dimensional input
| label tensor $label$.
|
| The op produces a single length $N$ output
| tensor $Y$. Here, $N$ is considered
| the batch size and $D$ is the size of each
| element in the batch. In practice, it
| is most commonly used at the end of models
| as a part of the loss computation, after
| the SoftMax operator and before the
| AveragedLoss operator. The cross entropy
| operation is defined as follows
|
| $$Y_i = \sum_j (label_{ij} * log(X_{ij}))$$
|
| where ($i$, $j$) is the classifier’s
| prediction of the $j$th class (the correct
| one), and $i$ is the batch size. Each
| log has a lower limit for numerical stability.
|
| Github Links:
|
| - https://github.com/caffe2/caffe2/blob/master/caffe2/operators/cross_entropy_op.h
|
| - https://github.com/caffe2/caffe2/blob/master/caffe2/operators/cross_entropy_op.cc
|
| This operator computes the cross entropy
| between a $NxD$ dimensional input data
| tensor $X$ and a one dimensional input
| label tensor $label$. The op produces
| a single length $N$ output tensor $Y$.
| Here, $N$ is considered the batch size
| and $D$ is the size of each element in
| the batch. In practice, it is most commonly
| used at the end of models as a part of the
| loss computation, after the
|
| SoftMax operator and before the AveragedLoss
| operator. The cross entropy operation
| is defined as follows
|
| $$Y_i = -log(X_{ij})$$
|
| where ($i$, $j$) is the classifier’s
| prediction of the $j$th class (the correct
| one), and $i$ is the batch size. Each
| log has a lower limit for numerical stability.
|
| The difference between LabelCrossEntropy
| and CrossEntropy is how the labels
| are specified.
|
| Here, the labels are a length $N$ list
| of integers, whereas in CrossEntropy
| the labels are a $NxD$ dimensional matrix
| of one hot label vectors. However, the
| results of computation should be the
| same, as shown in the two examples where
| ($i$, $j$) is the classifier’s prediction
| of the $j$th class (the correct one),
| and $i$ is the batch size. Each log has
| a lower limit for numerical stability.
|
| Github Links:
|
| - https://github.com/caffe2/caffe2/blob/master/caffe2/operators/cross_entropy_op.h
|
| - https://github.com/caffe2/caffe2/blob/master/caffe2/operators/cross_entropy_op.cc
|
| Given a vector of probabilities, this
| operator transforms this into a 2-column
| matrix with complimentary probabilities
| for binary classification. In explicit
| terms, given the vector X, the output
| Y is vstack(1 - X, X).
|
| Hacky: turns a vector of probabilities
| into a 2-column matrix with complimentary
| probabilities for binary classification
|
| Given two matrices logits and targets,
| of same shape, (batch_size, num_classes),
| computes the sigmoid cross entropy
| between the two.
|
| Returns a tensor of shape (batch_size,)
| of losses for each example.
|
| Given three matrices: logits, targets,
| weights, all of the same shape, (batch_size,
| num_classes), computes the weighted
| sigmoid cross entropy between logits
| and targets. Specifically, at each
| position r,c, this computes weights[r,
| c] * crossentropy(sigmoid(logits[r,
| c]), targets[r, c]), and then averages
| over each row.
|
| Returns a tensor of shape (batch_size,)
| of losses for each example.
|