| Adds batch sub-tensors elementwise
| to output. Stripe is the stripe length
| and N is the number of elements to add
| (size of Y).
|
| Different from the Axpy function above, if
| alpha is passed in as a pointer, we will
| assume that it lives on the Context device,
| for example on GPU.
| Applies a per-channel bias value to
| each channel of the input image. image_size
| is H * W
|
| Common math functions being used in Caffe that
| do not have a BLAS or MKL equivalent. For all
| these functions, we will simply implement them
| either via Eigen or via custom code.
| groups must be 1 for GPU
|
| For NHWC order with groups > 1, the result
| will be layout in NHW G RS C/G order to make
| data within the same group to be contiguous.
|
| For NCHW order, groups doesn’t make any
| difference because we’re doing Im2Col for each
| N and C is the slowest moving dimension among
| CHW.
| The layout of the result is N H W G R S C/G.
|
| Note that groups are pulled out to an outer
| dimension so that we can use GEMMs efficiently.
|
| pad_p - previous frame
| pad_t - top
| pad_l - left
| pad_n - next frame
| pad_b - bottom
| pad_r - right
| groups must be 1 for GPU
|
| For NHWC order with groups > 1, the result
| will be layout in NHW G RS C/G order to make
| data within the same group to be contiguous.
|
| For NCHW order, groups doesn’t make any
| difference because we’re doing Im2Col for each
| N and C is the slowest moving dimension among
| CHW.
| Compute the column-wise max of a N*D
| matrix X, and write it to a D dimensional
| vector y.
|
| Calculates ceil(a / b). User must be
| careful to ensure that there is no overflow
| or underflow in the calculation.
|
| Dot matrix of vector a and b, and writes
| the result to a single value y.
|
|note that i probably fucked up the
|caffe2_use_eigen_for_blas and caffe2_use_mkl
|switches and the macro invocations should also
|probably be covered by the switches
| Decaf gemm provides a simpler interface
| to the gemm functions, with the limitation
| that the data has to be contiguous in
| memory.
|
| GemmBatched provides a simple abstraction
| into library routines
|
| We also provide a gemm that has explicit lda,
| ldb and ldc specified.
|
| In most cases you probably want to use the
| function above, though.
| Gemv always takes in a M*N matrix A, and
| depending on whether we set TransA to Trans,
| the output is:
|
| CblasNoTrans: x is an N dim vector and y is an M dim vector.
| CblasTrans: x is an M dim vector and y is an N dim vector.
| groups must be 1 for GPU
|
| For NHWC order with groups > 1, the result
| will be layout in NHW G RS C/G order to make
| data within the same group to be contiguous.
|
| For NCHW order, groups doesn’t make any
| difference because we’re doing Im2Col for each
| N and C is the slowest moving dimension among
| CHW.
| The layout of the result is N H W G R S C/G.
|
| Note that groups are pulled out to an outer
| dimension so that we can use GEMMs efficiently.
|
| pad_p previous frame
| pad_t top
| pad_l left
| pad_n next frame
| pad_b bottom
| pad_r right
groups must be 1 for GPU
| incrementIfNotMax increments the
| number if the value is not max for that
| datatype. This ensures that the value
| never overflows.
|
Returns log2(n) for a positive integer type
| Returns the next highest power-of-2
| for an integer type
|
| Function uses casting from int to unsigned
| to compare if value of parameter a is
| greater or equal to zero and lower than
| value of parameter b.
|
| The b parameter is of type signed and
| is always positive,
|
| therefore its value is always lower
| than 0x800… where casting negative
| value of a parameter converts it to value
| higher than 0x800…
|
| The casting allows to use one condition
| instead of two.
|
| Checks if the input permutation is an
| identity permutation;
|
| Elemwise maximum of vector x and scalar
| alpha. y[i] = max(x[i], alpha)
|
Computes mean and variance over axes.
| Generate n values that sum up to a fixed
| sum and subject to a restriction a <=
| x <= b for each x generated
|
| Generate n values from synthetic data
| distribution, define by unique accesses
| and stack distances
|
Y = alpha * ReduceL1(X)
Y = alpha * ReduceL2(X)
Y = alpha * ReduceMax(X)
Y = alpha * ReduceMean(X)
| In all of the reduce functions, X_dims and
| Y_dims should have ndim elements.
|
| Each dimension of Y_dims must match the
| corresponding dimension of X_dims or must be
| equal to 1. The dimensions equal to 1 indicate
| the dimensions of X to be reduced.
Y = alpha * ReduceSum(X)
| Rounds a up to the next highest multiple of
| b. User must be careful to ensure that there
| is no overflow or underflow in the calculation
| of divUp.
| Compute the row-wise max of a N*D matrix
| X, and write it to a N dimensional vector
| y.
|
| Different from the Scale function above, if
| alpha is passed in as a pointer, we will
| assume that it lives on the Context device,
| for example on GPU.
| Select does index selection of the rows
| a N*D matrix x, and gives the N dimensional
| vector y that contains the selected
| data.
|
| Sum of vector x, and writes the result
| to a single value y.
|
| Sum of squares of vector x, and writes
| the result to a single value y.
|
| Transpose tensor X with dims by axes
| and write the result to tensor Y.
|