Crate caffe2op_bisect

source ·

Structs

  • | This operator is to map raw feature values | into the percentile representations based on | Bisection for more than one feature. | | The input is the bath of input feature values, | with the size of (batch_size, num_feature), | where num_feature = F (F >= 1). | | For each feature, we also need additional | information regarding the feature value | distribution. | | There are several vectors to keep data to | percentile mappping information as arguments | (context): | | -1. feature raw values (R) | -2. feature percentile mapping (P) | -3. feature percentile lower bound (L) | -4. feature percentile upper bound (U) | | A toy example: | Suppose the sampled data distribution is as follows: | 1, 1, 2, 2, 2, 2, 2, 2, 3, 4 | We have the mapping vectors as follows: | R = [1, 2, 3, 4] | P = [0.15, 0.55, 0.9, 1.0] | L = [0.1, 0.3, 0.9, 1.0] | U = [0.2, 0.8, 0.9, 1.0] | Where P is computed as (L + U) / 2. | | For a given list of feature values, X = [x_0, | x_1, …, x_i, …], for each feature value | (x_i) we first apply bisection to find the right | index (t), such that R[t] <= x_i < R[t+1]. | | If x_i = R[t], P[t] is returned; | | otherwise, the interpolation is apply by | (R[t], R[t+1]) and (U[t] and L[t]). | | As there are F features (F >= 1), we concate | all the R_f, P_f, L_f, and U_f for each | feature f and use an additional input length to | keep track of the number of points for each set of | raw feature value to percentile mapping. | | For example, there are two features: | | R_1 =[0.1, 0.4, 0.5]; | R_2 = [0.3, 1.2]; | | We will build R = [0.1, 0.4, 0.5, 0.3, 1.2]; | besides, we have lengths = [3, 2] | | to indicate the boundaries of the percentile | information.

Constants