Crate caffe2_math

source ·

Macros

caffe2_rand_fixed_sum
| This is not uniformly distributed between a and | b. | | It takes advantage of normal distribution to | generate numbers with mean = sum / n. | | Ideally the algorithm should be generating | n numbers between 0 and 1, sum them up as | scaled_sum, and use sum / scaled_sum to adjust | the values to between a and b. | | The algorithm is non-trivial given the | adjustment would be different towards each | value.
caffe2_rand_synthetic_data
| Generate n values from synthetic data | distribution, define by unique accesses and | stack distances | | WARNING: can create this for all tables or per | table, but in latter case we need to know the | table id, to sample from the right distribution
caffe2_rand_uniform_char
caffe2_rand_uniform_int
caffe2_rand_uniform_real
caffe2_specialized_affine_channel
caffe2_specialized_axpby
caffe2_specialized_broadcast
caffe2_specialized_cbrt
caffe2_specialized_cdf_norm
caffe2_specialized_colwisemax
caffe2_specialized_compute_broadcast_binary_op_dims
Computest the broadcast binary operation dims.
caffe2_specialized_compute_transposed_strides
caffe2_specialized_copyvector
caffe2_specialized_cosh
caffe2_specialized_cpu_add_striped_batch
caffe2_specialized_dot
caffe2_specialized_erf
caffe2_specialized_get_index_from_dims
Get index value from dims and index digits.
caffe2_specialized_increase_index_in_dims
| Increase the index digits by one based | on dims. |
caffe2_specialized_inv_std
caffe2_specialized_maximum
caffe2_specialized_moments
caffe2_specialized_nchw2nhwc
caffe2_specialized_neg
caffe2_specialized_nhwc2nchw
caffe2_specialized_powx
caffe2_specialized_rand_uniform_unique
caffe2_specialized_rowwisemax
caffe2_specialized_scale
caffe2_specialized_set
| Common math functions being used in Caffe that | do not have a BLAS or MKL equivalent. For all | these functions, we will simply implement them | either via Eigen or via custom code.
caffe2_specialized_sincos
caffe2_specialized_sinh
caffe2_specialized_sum
caffe2_specialized_transpose
declare_binary_op
declare_compare_op
define_2d_broadcast_1st_div_function
define_2d_broadcast_bitwise_binary_function
define_2d_compare_function
define_broadcast_binary_function
define_broadcast_bitwise_binary_function
define_broadcast_compare_function
define_eigen_2d_broadcast_binary_function
define_eigen_2d_broadcast_div_function
define_eigen_2d_broadcast_sub_function
delegate_2d_broadcast_binary_function
delegate_axpy
delegate_broadcast_binary_function
delegate_colwise_reduce_function
delegate_eigen_2d_broadcast_1st_binary_function
| The actual implementation uses eigen | which is column major, so notice the | row/column swap in the actual implementation. |
delegate_eigen_2d_broadcast_2nd_binary_function
delegate_eigen_2d_broadcast_binary_function
delegate_global_reduce_function
delegate_reduce_function
delegate_rowwise_reduce_function
delegate_scale
delegate_simple_binary_function_by_eigen_function
delegate_simple_binary_function_by_eigen_operator
delegate_simple_binary_function_by_std_function
delegate_simple_compare_function_by_eigen_operator
delegate_simple_unary_function

Structs

AxpyImpl
BroadcastGPUTest
BroadcastTest
DefaultEngine
| An empty class as a placeholder for a | math function that has no specific engine | specified. |
GemmBatchedGPUTest
GemmBatchedTest
HalfAddFunctor
HalfDivFunctor
HalfMulFunctor
HalfSubFunctor
RandFixedSumTest
ScaleImpl
| proxy to a class because of partial specialization | limitations for functions |
TypeMetaCopy

Functions

abs
acos
add
add_striped_batch
| Adds batch sub-tensors elementwise | to output. Stripe is the stripe length | and N is the number of elements to add | (size of Y). |
affine_channel
and
asin
atan
axpby
axpy
axpy_fixed_size
axpy_with_alpha_as_ptr
| Different from the Axpy function above, if | alpha is passed in as a pointer, we will | assume that it lives on the Context device, | for example on GPU.
biasCHW
| Applies a per-channel bias value to | each channel of the input image. image_size | is H * W |
bias_chwf32cpu_context
bitwise_and
bitwise_or
bitwise_xor
both_ends_moments
both_ends_reduceL1
both_ends_reduceL2
both_ends_reduce_max
both_ends_reduce_mean
both_ends_reduce_min
both_ends_reduce_sum
broadcast
broadcast_binary_op_impl
broadcast_impl
| Common math functions being used in Caffe that | do not have a BLAS or MKL equivalent. For all | these functions, we will simply implement them | either via Eigen or via custom code.
cbrt
cdf_norm
check_reduce_dims
col_2im
| groups must be 1 for GPU | | For NHWC order with groups > 1, the result | will be layout in NHW G RS C/G order to make | data within the same group to be contiguous. | | For NCHW order, groups doesn’t make any | difference because we’re doing Im2Col for each | N and C is the slowest moving dimension among | CHW.
col_2im3d_nhwcimpl
| The layout of the result is N H W G R S C/G. | | Note that groups are pulled out to an outer | dimension so that we can use GEMMs efficiently. | | pad_p - previous frame | pad_t - top | pad_l - left | pad_n - next frame | pad_b - bottom | pad_r - right
col_2im_f32cpu_contextNCHW
col_2im_f32cpu_contextNHWC
col_2im_nd
| groups must be 1 for GPU | | For NHWC order with groups > 1, the result | will be layout in NHW G RS C/G order to make | data within the same group to be contiguous. | | For NCHW order, groups doesn’t make any | difference because we’re doing Im2Col for each | N and C is the slowest moving dimension among | CHW.
col_2im_nd_f32cpu_contextNCHW
col_2im_nd_f32cpu_contextNHWC
col_2im_zero_padding_and_no_dilationNCHW
col_2im_zero_padding_and_no_dilationNHWC
colwise_binary_op
colwise_max
| Compute the column-wise max of a N*D | matrix X, and write it to a D dimensional | vector y. |
colwise_moments
colwise_reduceL1
colwise_reduceL2
colwise_reduce_mean
compute_transpose_axes_for_reduce_op
compute_transpose_axes_for_reduce_op_with_reduce_axes
copy_matrix
copy_matrix_cpu_context
copy_matrix_strided
copy_matrix_with_item_size
copy_vector
cos
cosh
div
div_up
| Calculates ceil(a / b). User must be | careful to ensure that there is no overflow | or underflow in the calculation. |
dot
| Dot matrix of vector a and b, and writes | the result to a single value y. |
eQ
erf
execute_gpu_binary_op_test
exp
|note that i probably fucked up the |caffe2_use_eigen_for_blas and caffe2_use_mkl |switches and the macro invocations should also |probably be covered by the switches
gE
gT
gemm
| Decaf gemm provides a simpler interface | to the gemm functions, with the limitation | that the data has to be contiguous in | memory. |
gemm_batched
| GemmBatched provides a simple abstraction | into library routines |
gemm_batched_f32cpu_context
gemm_ex
| We also provide a gemm that has explicit lda, | ldb and ldc specified. | | In most cases you probably want to use the | function above, though.
gemm_ex_f32cpu_context
gemm_f32cpu_context
gemm_strided_batched
gemm_strided_batched_f32cpu_context
gemv
| Gemv always takes in a M*N matrix A, and | depending on whether we set TransA to Trans, | the output is: | | CblasNoTrans: x is an N dim vector and y is an M dim vector. | CblasTrans: x is an M dim vector and y is an N dim vector.
gemv_f32cpu_context
generate_stack_distance
generate_trace_lru
im_2col
| groups must be 1 for GPU | | For NHWC order with groups > 1, the result | will be layout in NHW G RS C/G order to make | data within the same group to be contiguous. | | For NCHW order, groups doesn’t make any | difference because we’re doing Im2Col for each | N and C is the slowest moving dimension among | CHW.
im_2col3d_nchwimpl
im_2col3d_nhwcimpl
| The layout of the result is N H W G R S C/G. | | Note that groups are pulled out to an outer | dimension so that we can use GEMMs efficiently. | | pad_p previous frame | pad_t top | pad_l left | pad_n next frame | pad_b bottom | pad_r right
im_2col_f32cpu_contextNCHW
im_2col_f32cpu_contextNHWC
im_2col_nd
groups must be 1 for GPU
im_2col_nd_f32cpu_contextNCHW
im_2col_nd_f32cpu_contextNHWC
im_2col_nd_nchwimpl
im_2col_zero_padding_and_no_dilationNCHW
im_2col_zero_padding_and_no_dilationNHWC
increment_if_not_max
| incrementIfNotMax increments the | number if the value is not max for that | datatype. This ensures that the value | never overflows. |
integer_log2
Returns log2(n) for a positive integer type
integer_next_highest_power_of2
| Returns the next highest power-of-2 | for an integer type |
inv_std
is_age_zero_and_altB
| Function uses casting from int to unsigned | to compare if value of parameter a is | greater or equal to zero and lower than | value of parameter b. | | The b parameter is of type signed and | is always positive, | | therefore its value is always lower | than 0x800… where casting negative | value of a parameter converts it to value | higher than 0x800… | | The casting allows to use one condition | instead of two. |
is_batch_transpose2D
is_both_ends_broadcast_binary_op
is_both_ends_reduce
is_colwise_broadcast_binary_op
is_colwise_reduce
is_identity_permutation
| Checks if the input permutation is an | identity permutation; |
is_rowwise_broadcast_binary_op
is_rowwise_reduce
lE
lT
log
log1p
max
maximum
| Elemwise maximum of vector x and scalar | alpha. y[i] = max(x[i], alpha) |
min
moments
Computes mean and variance over axes.
moments_impl
mul
nE
neg
negate
not_bool_cpu_context
or
powx
rand_fixed_sum
| Generate n values that sum up to a fixed | sum and subject to a restriction a <= | x <= b for each x generated |
rand_gaussian
rand_gaussian_f32cpu_context
rand_synthetic_data
| Generate n values from synthetic data | distribution, define by unique accesses | and stack distances |
rand_uniform
rand_uniform_unique
reduceL1
Y = alpha * ReduceL1(X)
reduceL2
Y = alpha * ReduceL2(X)
reduce_l1impl
reduce_l2impl
reduce_max
reduce_max_impl
reduce_max_with_alpha
Y = alpha * ReduceMax(X)
reduce_mean
Y = alpha * ReduceMean(X)
reduce_mean_impl
reduce_min
reduce_min_impl
reduce_min_with_alpha
| In all of the reduce functions, X_dims and | Y_dims should have ndim elements. | | Each dimension of Y_dims must match the | corresponding dimension of X_dims or must be | equal to 1. The dimensions equal to 1 indicate | the dimensions of X to be reduced.
reduce_sum
Y = alpha * ReduceSum(X)
reduce_sum_impl
reduce_tensor_impl
round_up
| Rounds a up to the next highest multiple of | b. User must be careful to ensure that there | is no overflow or underflow in the calculation | of divUp.
rowwise_binary_op
rowwise_max
| Compute the row-wise max of a N*D matrix | X, and write it to a N dimensional vector | y. |
rowwise_moments
rsqrt
scale
scale_fixed_size
scale_with_alpha_from_pointer
| Different from the Scale function above, if | alpha is passed in as a pointer, we will | assume that it lives on the Context device, | for example on GPU.
select
| Select does index selection of the rows | a N*D matrix x, and gives the N dimensional | vector y that contains the selected | data. |
select_f32cpu_context
set
sin
sin_cos
sinh
sqr
sqrt
square
sub
sum
| Sum of vector x, and writes the result | to a single value y. |
sum_sqr
| Sum of squares of vector x, and writes | the result to a single value y. |
sum_sqr_f32cpu_context
sum_sqr_f64cpu_context
tan
tanh
transpose
| Transpose tensor X with dims by axes | and write the result to tensor Y. |
transpose2D
transposeND
transpose_impl
xor