bench_diff 1.1.0

Reliably compare the latencies of two functions/closures.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
# A Model of Time-Dependent Random Noise

Following is a simple mathematical model of time-dependent random noise. This model is useful for reasoning about time-dependent noise and helps in understanding why `bench_diff` is more effective than traditional benchmarking for the comparison of latencies between two functions.

Nonetheless, the model's fit with reality is not necessary to validate `bench_diff` as the test benchmarks discussed previously provide independent validation of the library.

## The Model

The first section defines the model. Subsequent sections develop estimates for some relevant statistics and parameters.

**Definitions and Assumptions**

This section defines the model per se.

1. Let **ln(x)** be the natural logarithm of **x**.
2. Let **L(f, t)** be the latency of function **f** at time **t**.
3. Let **λ<sub>1</sub>** be the baseline (ideal) latency of function **f<sub>1</sub>** in the absence of noise; respectively, **λ<sub>2</sub>** for **f<sub>2</sub>**.
4. Given a random variable **χ**, let **E(χ)** and **Stdev(χ)** be the expectation and standard deviation of **χ**, respectively.
5. Assume time-dependent noise is **ν(t) =<sub>def</sub> α(t) * β(t)**, where:
   - **α(t)** is a differentiable deterministic function of **t**, such that there are positive constants **A<sub>L</sub>**, **A<sub>U</sub>**, and **A<sub>D</sub>** for which **A<sub>L</sub> ≤ α(t) ≤ A<sub>U</sub>** and **|α'(t)| ≤ A<sub>D</sub>**, for all **t**.
   - **β(t)** is a family of mutually independent log-normal random variables indexed by **t**, such that  **E(ln(β(t))) = 0** and **Stdev(ln(β(t))) = σ**, where **σ** is a constant that does not depend on **t**.

6. Assume **L(f<sub>1</sub>, t) = λ<sub>1</sub> * ν(t)** and **L(f<sub>2</sub>, t) = λ<sub>2</sub> * ν(t)** for all **t**.  

   A couple of points to notice about this model:  

   - λ<sub>1</sub> and λ<sub>2</sub> can't be determined independently of α. If we multiply λ<sub>1</sub> and λ<sub>2</sub> by a constant factor r and divide α by the same factor, we end up with an exactly equivalent model. Nonetheless, the ratio λ<sub>1</sub>/λ<sub>2</sub> remains the same regardless of the multiplier r, so the model is useful to reason about the ratio of the function latencies or, equivalently, the difference of the logarithm of the latencies.
     - This spurious degree of freedom in the model can be removed, without loss of generality, by assuming the lowest value of α(t) is 1 = A<sub>L</sub>.
   - As a log-normally distributed random variable, β(t) can take arbitrarily small positive values, which implies that the function latencies can take arbitrarily small values. That is clearly not 100% realistic. It is, however, a reasonable approximation of reality, especially when σ is close to 0, in which case the values of β(t) are clustered around 1.

7. Let **mean_diff_ln** be the sample mean difference between the natural logarithms of the observed latencies.

8. When we measure f<sub>1</sub>'s latency at a time **t<sub>1</sub>**, getting L(f<sub>1</sub>, t<sub>1</sub>), and right after we measure f<sub>2</sub>'s latency, the measurement for f<sub>2</sub> occurs at a time **t<sub>2</sub>** that is very close to L(f<sub>1</sub>, t<sub>1</sub>). We assume that **t<sub>2</sub> = t<sub>1</sub> + L(f<sub>1</sub>, t<sub>1</sub>)**.

**Game Plan**

The goal is to obtain a bound on how closely mean_diff_ln estimates ln(λ<sub>1</sub>/λ<sub>2</sub>):

- Obtain an upper bound BE on |E(mean_diff_ln) - ln(λ<sub>1</sub>/λ<sub>2</sub>)|.
- Obtain an upper bound BA on E(|mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)|).
- mean_diff_ln is approximately normal, so its median and mean are approximately the same.
- The median minimizes the mean absolute deviation, so E(|mean_diff_ln - E(mean_diff_ln))|) ≤ BA.
- A normal distribution with a mean absolute deviation from its mean ≤ BA has stdev ≤ BA * √(π/2).
- The 99% confidence interval around the mean for a normal distribution is stdev * 2.58.
- |mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)| ≤ |mean_diff_ln - E(mean_diff_ln)| + |E(mean_diff_ln) - ln(λ<sub>1</sub>/λ<sub>2</sub>)|.
- With 99% confidence, |mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)| ≤ BA * √(π/2) * 2.58 + BE.

**Expansion of mean_diff_ln**

This section expands **mean_diff_ln** (see *Definition 7*) to facilitate subsequent upper bound calculations.

1. When f<sub>1</sub> is executed at time t<sub>1</sub> and f<sub>2</sub> is executed right after at time t<sub>2</sub> = t<sub>1</sub> + Δt<sub>1</sub>, using *Assumptions 5, 6, 8*:

   - L(f<sub>1</sub>, t<sub>1</sub>) = λ<sub>1</sub> * α(t<sub>1</sub>) * β(t<sub>1</sub>)
   - L(f<sub>2</sub>, t<sub>2</sub>) = λ<sub>2</sub> * α(t<sub>1</sub> + Δt<sub>1</sub>) * β(t<sub>2</sub>)  **[** _where Δt<sub>1</sub> = L(f<sub>1</sub>, t<sub>1</sub>)_ **]**

2. Applying natural logarithms on *Point 2*:

   - ln(L(f<sub>1</sub>, t<sub>1</sub>)) = ln(λ<sub>1</sub>) + ln(α(t<sub>1</sub>)) + ln(β(t<sub>1</sub>))
   - ln(L(f<sub>2</sub>, t<sub>2</sub>)) = ln(λ<sub>2</sub>) + ln(α(t<sub>1</sub> + Δt<sub>1</sub>)) + ln(β(t<sub>2</sub>))

3. By Lagrange's mean value theorem for ln(α(t)) and the bounds on α(t) and α'(t) from *Assumption 5*:

   - ln(α(t<sub>1</sub> + Δt<sub>1</sub>))  

     = ln(α(t<sub>1</sub>)) + Δt<sub>1</sub> * α'(t<sub>p</sub>)/α(t<sub>p</sub>)  **[** _for some t<sub>p</sub> between t<sub>1</sub> and t<sub>1</sub> + Δt<sub>1</sub>_ **]**  

     = ln(α(t<sub>1</sub>)) + Δt<sub>1</sub> * α'(t<sub>p</sub>)/α(t<sub>1</sub>) * α(t<sub>1</sub>)/α(t<sub>p</sub>)  

     = ln(α(t<sub>1</sub>)) + Δt<sub>1</sub> * α'(t<sub>p</sub>)/α(t<sub>1</sub>) * (1 + α(t<sub>1</sub>)/α(t<sub>p</sub>) - α(t<sub>1</sub>)/α(t<sub>1</sub>))  

     = ln(α(t<sub>1</sub>)) + Δt<sub>1</sub> * α'(t<sub>p</sub>)/α(t<sub>1</sub>) * (1 + α(t<sub>1</sub>) * 1/(α(t<sub>p</sub>) - 1/α(t<sub>1</sub>)))  

     By Lagrange's mean value theorem for 1/α(t):

     = ln(α(t<sub>1</sub>)) + Δt<sub>1</sub> * α'(t<sub>p</sub>)/α(t<sub>1</sub>) * (1 + α(t<sub>1</sub>) * 1/(α(t<sub>p</sub>) - 1/α(t<sub>1</sub>)))  

     = ln(α(t<sub>1</sub>)) + Δt<sub>1</sub> * α'(t<sub>p</sub>)/α(t<sub>1</sub>) * (1 + α(t<sub>1</sub>) * -α'(t<sub>q</sub>)/α(t<sub>q</sub>)<sup>2</sup> * (t<sub>p</sub> - t<sub>1</sub>))  **[** _for some t<sub>p</sub> between t<sub>1</sub> and t<sub>p</sub>_ **]**  

     = ln(α(t<sub>1</sub>)) + Δt<sub>1</sub> * ε<sub>1</sub>(t<sub>1</sub>)/α(t<sub>1</sub>)  

     **[** _where ε<sub>1</sub>(t<sub>1</sub>) =<sub>def</sub> α'(t<sub>p</sub>)\*(1\-α(t<sub>1</sub>)\*α'(t<sub>q</sub>)/α(t<sub>q</sub>)<sup>2</sup>\*(t<sub>p</sub>\-t<sub>1</sub>)), and thus |ε<sub>1</sub>(t<sub>1</sub>)| ≤ A<sub>D</sub>\*(1\+A<sub>D</sub>\*A<sub>U</sub>/A<sub>L</sub><sup>2</sup>\*Δt<sub>1</sub>)_ **]**

4. Applying *Point 3* to *Point 2*:

   - ln(L(f<sub>1</sub>, t<sub>1</sub>)) = ln(λ<sub>1</sub>) + ln(α(t<sub>1</sub>)) + ln(β(t<sub>1</sub>))

   - ln(L(f<sub>2</sub>, t<sub>2</sub>))  

     = ln(λ<sub>2</sub>) + ln(α(t<sub>1</sub>)) + Δt<sub>1</sub> * ε<sub>1</sub>(t<sub>1</sub>)/α(t<sub>1</sub>) + ln(β(t<sub>2</sub>))  

     Since Δt<sub>1</sub> = L(f<sub>1</sub>, t<sub>1</sub>):  

     = ln(λ<sub>2</sub>) + ln(α(t<sub>1</sub>)) + L(f<sub>1</sub>, t<sub>1</sub>) * ε<sub>1</sub>(t<sub>1</sub>)/α(t<sub>1</sub>) + ln(β(t<sub>2</sub>))  

     Using *Assumptions 5 and 6* to expand L(f<sub>1</sub>, t<sub>1</sub>):  

     = ln(λ<sub>2</sub>) + ln(α(t<sub>1</sub>)) + (λ<sub>1</sub> * α(t<sub>1</sub>) * β(t<sub>1</sub>)) * ε<sub>1</sub>(t<sub>1</sub>)/α(t<sub>1</sub>) + ln(β(t<sub>2</sub>))  

     = ln(λ<sub>2</sub>) + ln(α(t<sub>1</sub>)) + λ<sub>1</sub> * β(t<sub>1</sub>) * ε<sub>1</sub>(t<sub>1</sub>) + ln(β(t<sub>2</sub>))

5. Subtracting the second equation from the first in *Point 5*:

   - ln(L(f<sub>1</sub>, t<sub>1</sub>) - L(f<sub>2</sub>, t<sub>2</sub>))  

     = ln(λ<sub>1</sub>) + ln(α(t<sub>1</sub>)) + ln(β(t<sub>1</sub>)) - ln(λ<sub>2</sub>) - ln(α(t<sub>1</sub>)) - λ<sub>1</sub> * β(t<sub>1</sub>) * ε<sub>1</sub>(t<sub>1</sub>) - ln(β(t<sub>2</sub>))  

     = ln(λ<sub>1</sub>/λ<sub>2</sub>) - λ<sub>1</sub> * β(t<sub>1</sub>) * ε<sub>1</sub>(t<sub>1</sub>) + ln(β(t<sub>1</sub>)) - ln(β(t<sub>2</sub>))

6. Using *Assumptions 5, 6, 8* to expand L(f<sub>1</sub>, t<sub>1</sub>) and rewrite the bound on |ε<sub>1</sub>(t<sub>1</sub>)|:

   - |ε<sub>1</sub>(t<sub>1</sub>)|  

     ≤ A<sub>D</sub>\*(1\+A<sub>D</sub>\*A<sub>U</sub>/A<sub>L</sub><sup>2</sup>\*Δt<sub>1</sub>)  

     = A<sub>D</sub>\*(1\+A<sub>D</sub>\*A<sub>U</sub>/A<sub>L</sub><sup>2</sup>\*L(f<sub>1</sub>, t<sub>1</sub>))  

     = A<sub>D</sub>\*(1\+A<sub>D</sub>\*A<sub>U</sub>/A<sub>L</sub><sup>2</sup>\*(λ<sub>1</sub> * α(t<sub>1</sub>) * β(t<sub>1</sub>)))  

     ≤ A<sub>D</sub>\*(1\+A<sub>D</sub>\*A<sub>U</sub>/A<sub>L</sub><sup>2</sup>\*(λ<sub>1</sub> * A<sub>U</sub> * β(t<sub>1</sub>)))  

     = A<sub>D</sub>\*(1\+λ<sub>1</sub>\*A<sub>D</sub>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*β(t<sub>1</sub>))

7. With `bench_diff`, measurements are done pairs, with one half of the pairs having **f<sub>1</sub>** followed by **f<sub>2</sub>** and the other half having **f<sub>2</sub>** followed by **f<sub>1</sub>**. The equation in *Point 5* above pertains to the first case. The analogous equation for the second case is:  

   - ln(L(f<sub>2</sub>, t<sub>2'</sub>) - ln(L(f<sub>1</sub>, t<sub>1'</sub>)) = ln(λ<sub>2</sub>/λ<sub>1</sub>) - λ<sub>2</sub> * β(t<sub>2'</sub>) * ε<sub>2</sub>(t<sub>2'</sub>) + ln(β(t<sub>1'</sub>)) - ln(β(t<sub>2'</sub>))  

   Or, equivalently:

   - ln(L(f<sub>1</sub>, t<sub>1'</sub>)) - ln(L(f<sub>2</sub>, t<sub>2'</sub>) = ln(λ<sub>1</sub>/λ<sub>2</sub>) + λ<sub>2</sub> * β(t<sub>2'</sub>) * ε<sub>2</sub>(t<sub>2'</sub>) - ln(β(t<sub>1'</sub>)) + ln(β(t<sub>2'</sub>))  

     **[** _where ε<sub>2</sub>(t<sub>2'</sub>) =<sub>def</sub> α'(t<sub>p'</sub>)\*(1\-α(t<sub>2'</sub>)\*α'(t<sub>q'</sub>)/α(t<sub>q'</sub>)<sup>2</sup>\*(t<sub>p'</sub>\-t<sub>2'</sub>)), and thus |ε<sub>2</sub>(t<sub>2'</sub>)| ≤ A<sub>D</sub>\*(1\+λ<sub>2</sub>\*A<sub>D</sub>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*β(t<sub>2'</sub>))) (see Point 6)_ **]**

8. Assuming the number of latency observations for each function is **n** and considering the two cases as described in *Point 7*, we can calculate the sample mean difference between the natural logarithms of the observed latencies (see *Definition 7*):  

   - mean_diff_ln = (1/n) * ∑<sub>i=1..n</sub> (ln(L(f<sub>1</sub>, t<sub>1,i</sub>)) - ln(L(f<sub>2</sub>, t<sub>2,i</sub>)))  

     = (1/n) * (∑<sub>i:odd</sub> (ln(L(f<sub>1</sub>, t<sub>1,i</sub>)) - ln(L(f<sub>2</sub>, t<sub>2,i</sub>))) + ∑<sub>i:even</sub> (ln(L(f<sub>1</sub>, t<sub>1,i</sub>)) - ln(L(f<sub>2</sub>, t<sub>2,i</sub>))))  

     = (1/n) * ∑<sub>i:odd</sub> (ln(λ<sub>1</sub>/λ<sub>2</sub>) - λ<sub>1</sub> * β(t<sub>1,i</sub>) * ε<sub>1</sub>(t<sub>1,i</sub>) + ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>))) +  

        (1/n) * ∑<sub>i:even</sub> (ln(λ<sub>1</sub>/λ<sub>2</sub>) + λ<sub>2</sub> * β(t<sub>2,i</sub>) * ε<sub>2</sub>(t<sub>2,i</sub>) - ln(β(t<sub>1,i</sub>)) + ln(β(t<sub>2,i</sub>)))  

     = ln(λ<sub>1</sub>/λ<sub>2</sub>) +  

        (1/n) * ∑<sub>i:odd</sub> (-λ<sub>1</sub> * β(t<sub>1,i</sub>) * ε<sub>1</sub>(t<sub>1,i</sub>) + ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>))) +  

        (1/n) * ∑<sub>i:even</sub> (λ<sub>2</sub> * β(t<sub>2,i</sub>) * ε<sub>2</sub>(t<sub>2,i</sub>) - ln(β(t<sub>1,i</sub>)) + ln(β(t<sub>2,i</sub>)))

**Expectation of mean_diff_ln**

This section calculates a bound on |E(mean_diff_ln) - ln(λ<sub>1</sub>/λ<sub>2</sub>)| (see *Definition 7*).

1. From *Expansion Point 8*:

   - E(mean_diff_ln) - ln(λ<sub>1</sub>/λ<sub>2</sub>)  

      Due to the linearity of E() and the fact that E(β(t)) = 0:

      = (1/n) * -λ<sub>1</sub> * ∑<sub>i:odd</sub>  E(β(t<sub>1,i</sub>) * ε<sub>1</sub>(t<sub>1,i</sub>)) + (1/n) * λ<sub>2</sub> * ∑<sub>i:even</sub> E(β(t<sub>2,i</sub>) * ε<sub>2</sub>(t<sub>2,i</sub>))

2. From *Point 1* and the bounds on ε<sub>1</sub>(t<sub>1,i</sub>) and ε<sub>2</sub>(t<sub>2,i</sub>) from *Expansion Points 6 and 7*:

   - |E(mean_diff_ln) - ln(λ<sub>1</sub>/λ<sub>2</sub>)|  

      ≤ (1/n) * λ<sub>1</sub> * ∑<sub>i:odd</sub>  E(β(t<sub>1,i</sub>) * |ε<sub>1</sub>(t<sub>1,i</sub>)|) + (1/n) * λ<sub>2</sub> * ∑<sub>i:even</sub> E(β(t<sub>2,i</sub>) * |ε<sub>2</sub>(t<sub>2,i</sub>)|)  

      ≤ (1/n) * λ<sub>1</sub> * ∑<sub>i:odd</sub>  E(β(t<sub>1,i</sub>) * A<sub>D</sub>\*(1\+λ<sub>1</sub>\*A<sub>D</sub>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*β(t<sub>1,i</sub>))) +  
      (1/n) * λ<sub>2</sub> * ∑<sub>i:even</sub> E(β(t<sub>2,i</sub>) * A<sub>D</sub>\*(1\+λ<sub>2</sub>\*A<sub>D</sub>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*β(t<sub>2,i</sub>)))  

      = (1/n) * λ<sub>1</sub> * ∑<sub>i:odd</sub>  (A<sub>D</sub>\*E(β(t<sub>1,i</sub>) + λ<sub>1</sub>\*A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*E(β(t<sub>1,i</sub>)<sup>2</sup>)) + 
      (1/n) * λ<sub>2</sub> * ∑<sub>i:even</sub> (A<sub>D</sub>\*E(β(t<sub>2,i</sub>) + λ<sub>2</sub>\*A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*E(β(t<sub>2,i</sub>)<sup>2</sup>))  

      By the expectation of log-normal distributions:  

      = (1/n) * λ<sub>1</sub> * ∑<sub>i:odd</sub>  (A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + λ<sub>1</sub>\*A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp((2\*σ)<sup>2</sup>/2)) +  
      (1/n) * λ<sub>2</sub> * ∑<sub>i:even</sub> (A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + λ<sub>2</sub>\*A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp((2\*σ)<sup>2</sup>/2))  

      = (1/n) * λ<sub>1</sub> * ∑<sub>i:odd</sub>  (A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + λ<sub>1</sub>\*A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>)) +  
      (1/n) * λ<sub>2</sub> * ∑<sub>i:even</sub> (A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + λ<sub>2</sub>\*A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>))  

      = λ<sub>1</sub> / 2 * (A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + λ<sub>1</sub>\*A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>)) +  
      λ<sub>2</sub> / 2 * (A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + λ<sub>2</sub>\*A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>))  

      = (λ<sub>1</sub>\+λ<sub>2</sub>)/2 * A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + (λ<sub>1</sub><sup>2</sup>+λ<sub>2</sub><sup>2</sup>)/2 * A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>)

3. Define **BE** as the right-hand-side of the immediately above equality. Thus:

   - |E(mean_diff_ln) - ln(λ<sub>1</sub>/λ<sub>2</sub>)| ≤ BE.

**Mean Absolute Error of mean_diff_ln**

This section calculates the expected absolute error of **mean_diff_ln** (see *Definition 7*) as an estimator of **ln(λ<sub>1</sub>/λ<sub>2</sub>)**.

1. From *Expansion Point 8*::

   - **absolute_error(mean_diff_ln)** =<sub>def</sub> |mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)|  

     = |(1/n) * ∑<sub>i:odd</sub> (-λ<sub>1</sub> * β(t<sub>1,i</sub>) * ε<sub>1</sub>(t<sub>1,i</sub>) + ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>))) +  

     ​     (1/n) * ∑<sub>i:even</sub> (λ<sub>2</sub> * β(t<sub>2,i</sub>) * ε<sub>2</sub>(t<sub>2,i</sub>) - ln(β(t<sub>1,i</sub>)) + ln(β(t<sub>2,i</sub>)))|

     = |(1/n) * ∑<sub>i:odd</sub> -λ<sub>1</sub> * β(t<sub>1,i</sub>) * ε<sub>1</sub>(t<sub>1,i</sub>) + (1/n) * ∑<sub>i:even</sub> λ<sub>2</sub> * β(t<sub>2,i</sub>) * ε<sub>2</sub>(t<sub>2,i</sub>) +  

     ​     (1/n) * ∑<sub>i:odd</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>))) - ∑<sub>i:even</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>)))|  

     ≤ (1/n) * |∑<sub>i:odd</sub> -λ<sub>1</sub> * β(t<sub>1,i</sub>) * ε<sub>1</sub>(t<sub>1,i</sub>) + (1/n) * ∑<sub>i:even</sub> λ<sub>2</sub> * β(t<sub>2,i</sub>) * ε<sub>2</sub>(t<sub>2,i</sub>)| +  

        (1/n) * |∑<sub>i:odd</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>))) - ∑<sub>i:even</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>)))|

2. From the above point, defining the mean absolute error:

   - **mean_absolute_error(mean_diff_ln)** =<sub>def</sub> E(absolute_error(mean_diff_ln))  

       ≤ (1/n) * λ<sub>1</sub> * ∑<sub>i:odd</sub>  E(β(t<sub>1,i</sub>) * |ε<sub>1</sub>(t<sub>1,i</sub>)|) + (1/n) * λ<sub>2</sub> * ∑<sub>i:even</sub> E(β(t<sub>2,i</sub>) * |ε<sub>2</sub>(t<sub>2,i</sub>)|) +  
      (1/n) * E(|∑<sub>i:odd</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>))) - ∑<sub>i:even</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>)))|)  
      
      By *Expectation Point 2*, the first line on the right hand side of the above inequality is bounded by the last line of *Expectation Point 2*. Thus:  
       
       ≤ (λ<sub>1</sub>\+λ<sub>2</sub>)/2 * A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + (λ<sub>1</sub><sup>2</sup>+λ<sub>2</sub><sup>2</sup>)/2 * A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>) + 
       
      (1/n) * E(|∑<sub>i:odd</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>))) - ∑<sub>i:even</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>)))|)  
       
       By *Expectation Point 3*:  
       
       = BE +   
      (1/n) * E(|∑<sub>i:odd</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>))) - ∑<sub>i:even</sub> (ln(β(t<sub>1,i</sub>)) - ln(β(t<sub>2,i</sub>)))|)  
       
       On the line immediately above, term inside the absolute value is the sum of 2*n independently distributed normal random variables with mean 0 and standard deviation σ. It is, therefore, a normal distribution with mean 0 and standard deviation σ\*√(2\*n). From the formula for the expectation of the absolute value of a normal random variable, we get:  
       
       = BE + (1/n) * σ\*√(2\*n) * √(2/π)  
       
       = BE + 2 * σ / √(n\*π) 

3. Define **BA** as the right-hand-side of the immediately above equality. Thus:

   - mean_absolute_error(mean_diff_ln) ≤ BA.

4. For a large sample size n, the above upper bound becomes, approximately:

    - BA  

      ≈ BE  

      = (λ<sub>1</sub>\+λ<sub>2</sub>)/2 * A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + (λ<sub>1</sub><sup>2</sup>+λ<sub>2</sub><sup>2</sup>)/2 * A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>)

**Confidence Interval**

1. mean_diff_ln is approximately normal, so its median and mean are approximately the same.

2. The median minimizes the mean absolute deviation. From *Point 1* and *Mean Absolute Error Point 3*:

   - E(|mean_diff_ln - E(mean_diff_ln))|)  

     ≈ E(|mean_diff_ln - median(mean_diff_ln))|)

     ≤ mean_absolute_error(mean_diff_ln)  

     ≤ BA

3. A normal distribution with a mean absolute deviation from its mean ≤ BA has stdev ≤ BA * √(π/2). Thus, from *Points 1 and 2*:

   - Stdev(mean_diff_ln) ≤ BA * √(π/2)

4. The 99% confidence interval around the mean for a normal distribution is stdev * 2.58. Thus:

   - |mean_diff_ln - E(mean_diff_ln))| ≤ BA * √(π/2) * 2.58  **[** _with 99% confidence_ **]**

5. 99% confidence bound for |mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)|:

   - |mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)|  

     ≤ |mean_diff_ln - E(mean_diff_ln)| + |E(mean_diff_ln) - ln(λ<sub>1</sub>/λ<sub>2</sub>)|  

     From *Point 4* and *Expectation Point 3*:

     ≤ BA * √(π/2) * 2.58 + BE **[** _with 99% confidence_ **]**  

     ≤ (BE + 2 * σ / √(n\*π)) * √(π/2) * 2.58 + BE  **[** _with 99% confidence_ **]**  

     = BE * (√(π/2) * 2.58 + 1) + 2 * σ / √(n\*π) * √(π/2) * 2.58  **[** _with 99% confidence_ **]**  

     = BE * (√(π/2) * 2.58 + 1) + √(2/n) * σ * 2.58  **[** _with 99% confidence_ **]**

6. Define **BC** as the right-hand-side of the immediately above equality. Thus:

   - |mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)| ≤ BC  **[** _with 99% confidence_ **]**

7. For a large sample size n, the above confidence bound becomes, approximately:

   - BC  

     ≈ BE * (√(π/2) * 2.58 + 1) 

     = ((λ<sub>1</sub>\+λ<sub>2</sub>)/2 * A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + (λ<sub>1</sub><sup>2</sup>+λ<sub>2</sub><sup>2</sup>)/2 * A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>)) * (√(π/2) * 2.58 + 1)

**Reasonable Parameter Values**

Assume, without loss of generality, that the lowest value of α(t) is 1 = A<sub>L</sub>.

In practice, one could reasonably expect the time-dependent random noise parameters to be no greater (and likely substantially lower) than the values below:

- Rate of change of α(t) < 5% per second, which means A<sub>D</sub> < .00005 when time is measured in milliseconds and A<sub>D</sub> < .00000005 when time is measured in microseconds.
- A<sub>U</sub>/A<sub>L</sub> < 2
- σ < 0.3

From the above, given λ<sub>1</sub> and λ<sub>2</sub>, we can compute BC (assume a large sample size):

- λ<sub>1</sub> = λ<sub>2</sub> = 12 ⇒ BC < 0.003
- λ<sub>1</sub> = λ<sub>2</sub> = 120 ⇒ BC < 0.028

Notice that while the variability of the statistic mean_diff_ln depends on the sample size (exec_count), the BC upper bound does not, provided that the sample size is large enough (see *Confidence Interval Item 7*).

## Comparative Example

We will define an example of the above model and compare how `bench_diff` and the *traditional* benchmarking method fare with respect to the model. The example is admittedly contrived in order to facilitate approximate calculations and also to highlight the potential disparity of results between the two benchmarking methods.

**Model parameters**

- The two functions, **f<sub>1</sub>** and **f<sub>2</sub>** are identical (call it **f**), with **λ<sub>1</sub> = λ<sub>2</sub> = λ = 12 ms**.

- The number of executions of each function is **exec_count = 2500**. So, the total execution time, ignoring warm-up, is 1 minute.

- **α(t) = 1.5 + 1/2 * sin(t * 2*π / 60000)**, where **t** is the number of milliseconds elapsed since the start of the benchmark.  

  - α'(t)  

    = 1/2 * 2\*π / 60000 * cos(t * 2\*π / 60000)  

    = π / 60000 * cos(t * 2\*π / 60000)

  Therefore:

  - |α'(t)| ≤ π / 60000.

  And we have the following bounds for α(t):

  - A<sub>L</sub> = 1
  - A<sub>U</sub> = 2
  - A<sub>D</sub> = π / 60000

- **β(t)** has **σ = 0.28**.

  - Therefore, E(β(t)) = exp(σ<sup>2</sup>/2) ≈ 1.0400.

- The **baseline median latency** of f is the median latency when α(t) = 1. Its value is **λ = λ<sub>1</sub> = λ<sub>2</sub>** = 12 ms.

- The **baseline mean latency** of f is the mean latency when α(t) = 1. Its value  **μ = μ1 = μ2** = λ * exp(σ<sup>2</sup>/2) ≈ 12 * 1.0400 = 12.48 ms.

- The ratio of baseline medians is always equal to the ratio of baseline means even when f<sub>1</sub> ≠ f<sub>2</sub> because σ does not depend on f<sub>1</sub> or f<sub>2</sub>.

**`bench_diff` calculations**

- Given exec_count = 2500, the large sample size approximation for the BC bound can be used:

  - BC ≈ ((λ<sub>1</sub>\+λ<sub>2</sub>)/2 * A<sub>D</sub>\*exp(σ<sup>2</sup>/2) + (λ<sub>1</sub><sup>2</sup>+λ<sub>2</sub><sup>2</sup>)/2 * A<sub>D</sub><sup>2</sup>\*A<sub>U</sub><sup>2</sup>/A<sub>L</sub><sup>2</sup>\*exp(2\*σ<sup>2</sup>)) * (√(π/2) * 2.58 + 1)  

    ​      ≈ 0.00277

- From the model (*Confidence Interval Point 7*):

  - |mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)| ≤ 0.00277  **[** _with 99% confidence_ **]**

- So, the multiplicative error on the estimate of λ<sub>1</sub>/λ<sub>2</sub> (= λ/λ = 1 in our example) is at most, with 99% confidence:

  - exp(0.00277) ≈ 1.00278, i.e., less than 3/10 of 1% error.

- Recall that the error bound does not depend on the number of executions, so it is the same with only half the number of executions. Also, given the high exec_count assumed, the `bench_diff` results during the first half should be very close to those obtained during the second half.

- If instead of λ = λ<sub>1</sub> = λ<sub>2</sub> = 12 ms and exec_count = 2500 we assume λ = λ<sub>1</sub> = λ<sub>2</sub> = 120 ms and exec_count = 250, then, with 99% confidence:

  - |mean_diff_ln - ln(λ<sub>1</sub>/λ<sub>2</sub>)| ≤ 0.0284
  - Multiplicative error ≤ 1.0289, i.e., about 3% error.


***Traditional* method calculations**

- With the traditional method, we benchmark f<sub>1</sub> with exec_count = 2500 and then benchmark f<sub>2</sub> (which is the same as f<sub>1</sub> in our example) with exec_count = 2500.

- The first benchmark of f takes place during the first 30 seconds.

- The calculated sample mean latency is approximately:

  - μ * 1/30000 * **∫**<sub>0</sub><sup>30000</sup> α(t) dt  

    = μ / 30000 * (45000 + 1/2 * (-cos(30000 * 2\*π / 60000) + cos(0)) / (2\*π / 60000))  

    = 1.5\*μ + μ/30000 * 1/2 * (-cos(π) + cos(0)) / (2\*π / 60000)  

    = 1.5\*μ + μ * (-cos(π) + cos(0)) / (2\*π)  

    = 1.5\*μ + μ * (1 + 1) / (2\*π)  

    = μ * (1.5 + 1/π)  

    ≈ μ * 1.8183

- The second benchmark of f takes place during the second 30 seconds.

- The calculated sample mean latency is approximately:

  - μ * 1/30000 * **∫**<sub>30000</sub><sup>60000</sup> α(t) dt  

    = μ / 30000 * (45000 + 1/2 * (-cos(60000 * 2\*π / 60000) + cos(30000 * 2\*π / 60000)) / (2\*π / 60000))  

    = 1.5\*μ + μ/30000 * 1/2 * (-cos(2\*π) + cos(π)) / (2\*π / 60000)  

    = 1.5\*μ + μ * (-cos(2\*π) + cos(π)) / (2\*π)  

    = 1.5\*μ + μ * (-1 + -1) / (2\*π)  

    = μ * (1.5 - 1/π)  

    ≈ μ * 1.1817

- The estimated ratio of mean latencies is approximately 1.8183 / 1.1817 ≈ 1.5387, an estimating error of 54% in comparison with the baseline ratio of means μ1/μ2 = 1 (which is also equal to the baseline ratio of medians λ<sub>1</sub>/λ<sub>2</sub>).

- If instead of λ = λ<sub>1</sub> = λ<sub>2</sub> = 12 ms and exec_count = 2500 we assume λ = λ<sub>1</sub> = λ<sub>2</sub> = 120 ms and exec_count = 250, then the results are roughly the same, i.e., an estimating error of 54%.


**Closing comments for the example**

- In this example, the estimate of λ<sub>1</sub>/λ<sub>2</sub> (or equivalently μ1/μ2) provided by `bench_diff` is accurate to within 3/10 of 1% of the true value of 1. By contrast, the estimate of μ1/μ2 provided by the traditional method is inflated by 54%.
- If we "run" `bench_diff` during the model's first 30 seconds, we still get a very accurate estimate of λ<sub>1</sub>/λ<sub>2</sub> but the individual (sample mean) estimates of μ1 and μ2 are both close to μ * 1.8183, just as with the *traditional* method. Likewise, if we "run" `bench_diff` during the model's last 30 seconds, we still get a very accurate estimate of λ<sub>1</sub>/λ<sub>2</sub> but the individual (sample mean) estimates of μ1 and μ2 are both close to μ * 1.1817, just as with the *traditional* method.
- The key point of `bench_diff` is to repeatedly run both functions in close time proximity to each other so that the *ratios* of the two functions' latencies are close to the baseline even if the individual latencies themselves are distorted by time-dependent noise.
- If instead of λ = λ<sub>1</sub> = λ<sub>2</sub> = 12 ms and exec_count = 2500 we assume λ = λ<sub>1</sub> = λ<sub>2</sub> = 120 ms and exec_count = 250, then the accuracy of the `bench_diff` results is worse, but it is still much better (at around a 3% error) than the *traditional* result (which remains about the same at around a 54% error).