1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
// Copyright (c) Meta Platforms, Inc. and affiliates.
// This software may be used and distributed according to the terms of the
// GNU General Public License version 2.
//! # SCX Load Calculator
//!
//! A crate providing abstractions for calculating load between scheduling
//! domains.
//!
//! Load Balancing Primer
//! ---------------------
//!
//! Let's begin by doing a brief overview of load balancing. In order to
//! understand load balancing, we'll introduce the relevant terms, and explain
//! what load balancing is from there:
//!
//! *Weight*
//!
//! A positive integer value representing a task's priority in the system. The
//! meaning of weight is ultimately up to the caller to determine and apply, but
//! conceptually speaking weight is a linear scaling factor of a task's
//! perceived load (defined below) in the system.
//!
//! *Duty Cycle*
//!
//! A task's duty cycle is the proportion of time that it could have used a CPU
//! in some time window. For example, say that a task does nothing but spin in a
//! tight loop for 10ms, and then sleep for 10ms. Then in a 20ms time window,
//! its duty cycle would be 0.5.
//!
//! Note that this is not the same thing as the proportion of time it's
//! _runnable_. For example, if you had two such tasks sharing a CPU, one of
//! them may wait for 10ms to use the core while the other one runs, and then it
//! would run for its 10ms duty cycle. It was runnable for 20ms, but could only
//! actually use the CPU for 10ms, so its duty cycle was 10ms / 20ms = 0.5.
//!
//! *Load*
//!
//! A scheduling entity's load l_x is simply the product of its weight and duty
//! cycle:
//!
//! l_x = w_x * d_x
//!
//! *Infeasible Weights*
//!
//! At a conceptual level, the goal of a load balancer is of course to balance
//! load across the system. If a scheduling entity has load l_x, and the total
//! sum of all entities' loads on the system is L, then entity x should receive
//! l_x / L proportion of the available CPU capacity on the system.
//!
//! This formulation works fine in many cases, but can break down when an
//! entity's proportion of load (and thus allotted CPU capacity) exceeds the
//! amount of CPU it can consume.
//!
//! For example, say you were on a system with 2 sets of 2-core groups (for a
//! total of 4 cores on the system), eight tasks with the following duty cycle
//! and weight:
//!
//! ID Weight Duty Cycle Load
//!
//! 0 1 1.0 1
//! 1 1 1.0 1
//! 2 1 1.0 1
//! 3 1 1.0 1
//! 4 1 1.0 1
//! 5 1 1.0 1
//! 6 1 1.0 1
//! 7 10000 1.0 10000
//!
//!
//! The load sum L of the system is 1 * 7 + 1 * 10000 = 10007. This means that
//! tasks 0-6 have a load proportion of 1 / 10007 ~= 0.0001, and task 7 has a
//! load proportion of 0.9993. In other words, tasks 0-6 are entitled to
//! 0.0001 * 4 = 0.0004 CPUs worth of capacity, and task 7 is entitled to
//! 0.9993 * 4 = 3.9972 CPUs worth of capacity.
//!
//! Task 7 can only consume at most 1.0 CPU due to its duty cycle being 1.0, so
//! its weight is "infeasible" as the amount of CPU capacity it would be
//! afforded exceeds what it can actually use.
//!
//! What we instead want is to find an adjusted weight w' that we can assign to
//! Task 7 such that it gets its full duty cycle of 1.0, but allows the
//! remaining 3 cores worth of compute to be equally spread amongst the
//! remaining tasks. The task of finding this weight w' is known as the
//! "infeasible weights problem", and is solved by this crate.
//!
//! More details on load balancing and the infeasible weights problem are
//! provided in the following Google Drive document:
//!
//! <https://drive.google.com/file/d/1fAoWUlmW-HTp6akuATVpMxpUpvWcGSAv>
//!
//! Using the Crate
//! ---------------
//!
//! This crate has two primary sets of APIs:
//!
//! (1) APIs for aggregating domain loads specified by the caller
//! (2) APIs for querying those loads, after being adjusted for infeasibility
//!
//! LoadAggregator
//! --------------
//!
//! The object that addresses (1) is the LoadAggregator. This object receives
//! load passed by the user, and once all load has been received, runs the
//! numbers to create load sums and weights that are adjusted for infeasibility
//! as described above. LoadAggregator objects can be created and used as
//! follows:
//!
//! Assume we're on a 16-core (32-CPU) host with two core complexes:
//!
//!```rust
//! use scx_utils::LoadAggregator;
//! use log::info;
//!
//! // Create a LoadAggregator object, specifying the number of CPUs on the
//! // system, and whether it should only aggregate duty cycle.
//! let mut aggregator = LoadAggregator::new(32, false);
//!
//! // Domain 0, weight 1, has duty cycle 1.0.
//! aggregator.record_dom_load(0, 1, 1.0);
//!
//! // Domain 1, weight 1, has duty cycle 1.0.
//! aggregator.record_dom_load(1, 1, 1.0);
//! // ...
//! aggregator.record_dom_load(31, 1, 1.0);
//! // ...
//! aggregator.record_dom_load(63, 1, 1.0);
//!
//! // Domain 64, weight 10000, has duty cycle 1.0.
//! aggregator.record_dom_load(64, 10000, 1.0);
//!
//! // Note that it is allowed to record load for a domain more than once.
//! // For a given domain you may only record load for a specific weight
//! // once, but you may record loads for multiple different weights for a
//! // single domain.
//!
//! // Create the LoadLedger object
//! let ledger = aggregator.calculate();
//!
//! // Outputs: 66.06451612903226
//! info!("{}", ledger.global_load_sum());
//!```
//!
//! In the above example, we have 65 domains, all with duty cycle 1.0. 64 of
//! them have weight 1, and one of them has weight 10000. If there were multiple
//! tasks per domain, we would record different or additional values. For
//! example, if we had two tasks with weight 1 in domain 0, and an additional
//! task with weight 100 in domain 1, we would record their loads as follows:
//!
//!```rust
//! use scx_utils::LoadAggregator;
//! let mut aggregator = LoadAggregator::new(32, false);
//!
//! // In this version, domain 0 has 2 tasks with weight 1.0 and duty cycle
//! // 1.0.
//! aggregator.record_dom_load(0, 1, 2.0);
//!
//! // In this version, domain 1 also has a task with weight 100 and duty
//! // cycle 1.0.
//! aggregator.record_dom_load(1, 100, 1.0);
//!```
//!
//! Note that the abstractions here are meant to be generic across schedulers.
//! LoadAggregator assumes nothing about the scheduling domains being passed to
//! it. For example, they may span layers defined in a scheduler, domains
//! specified by the user via cpumask strings, or domains that span L3 caches.
//!
//! LoadLedger
//! ----------
//!
//! Once you have fed all load values to the LoadAggregator, you can use it to
//! calculate load sums (including adjusting for weight infeasibility), and
//! create a read-only LoadLedger object that can be used to query the values as
//! described in (2).
//!
//! There are various values of interest that can be queried from a LoadLedger
//! object. For example, you may ask for the sum of load (adjusted for
//! infeasibility) for the whole system, or the sum of duty cycle for the whole
//! system, or the sum of load for each domain (adjusted for infeasibility):
//!
//! ```rust
//! use scx_utils::LoadAggregator;
//! use log::info;
//! let mut aggregator = LoadAggregator::new(32, false);
//! aggregator.record_dom_load(0, 1, 1.0);
//! // ...
//! aggregator.record_dom_load(63, 1, 1.0);
//! aggregator.record_dom_load(64, 10000, 1.0);
//!
//! let ledger = aggregator.calculate();
//!
//! info!("Global load sum: {}", ledger.global_load_sum());
//! info!("Global duty cycle sum: {}", ledger.global_dcycle_sum());
//!
//! let dom_dcycles = ledger.dom_dcycle_sums();
//! let dom_loads = ledger.dom_load_sums();
//! let effective_max_weight = ledger.effective_max_weight();
//!
//! // ...
//! ```
use bail;
use Result;
use BTreeMap;
const MIN_WEIGHT: usize = 1;