1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
use derive_more::Display;
/// How work is distributed among processors during a benchmark run.
///
/// The work is redistributed for each benchmark iteration, ensuring that hardware-specific
/// performance anomalies are averaged out (e.g. if some processors have worse thermals and
/// throttle more often).
#[derive(Clone, Copy, Debug, Display, Eq, PartialEq)]
#[non_exhaustive]
pub enum WorkDistribution {
/// One worker pair is spawned for each numerically neighboring memory region pair.
///
/// For example, with 3 memory regions, we would have 3 pairs of workers: (0, 1), (1, 2), (2, 0).
///
/// Each pair will work together, processing one payload per pair. In total,
/// there will be two workers per memory region (one working with the "previous"
/// memory region and one working with the "next" one).
///
/// Each worker is pinned to a specific processor.
///
/// Different memory regions may be a different distance apart, so this allows us to average
/// out any differences - some pairs are faster, some are slower, we just want to average it
/// out so every benchmark run is consistent (instead of picking two random memory regions).
///
/// This option can only be used if there are at least two memory regions. Benchmark runs with
/// this distribution will be skipped if the system only has a single memory region.
PinnedMemoryRegionPairs,
/// Each worker in a pair is spawned in the same memory region.
///
/// Each pair will work together, processing one payload between the two members. Different
/// pairs may be in different memory regions.
///
/// Each worker is pinned to a specific processor.
///
/// The number of pairs will match the number that would have been used with
/// `PinnedMemoryRegionPairs`, for optimal comparability. There will be a minimum of one pair.
PinnedSameMemoryRegion,
/// Both workers in each pair are spawned on the same processor, picked arbitrarily from
/// each memory region. One pair from each memory region, on the same processor.
///
/// Each pair will work together, processing one payload between the two members. Different
/// pairs may be in different memory regions.
///
/// This can occasionally be insightful when it surprises you by showing that two threads on
/// the same processor do not need twice as long to get twice as much work done. Not useful
/// with most scenarios, though - best to skip unless probing specifically for this effect.
///
/// It may also showcase effects of memory regions when compared to `UnpinnedSameProcessor`.
///
/// The number of pairs will match the number that would have been used with
/// `PinnedMemoryRegionPairs`, for optimal comparability. There will be a minimum of one pair.
PinnedSameProcessor,
/// All workers are spawned without regard to memory region or processor, randomly picking
/// processors for each iteration.
///
/// Each worker is given back its own payload - while we still spawn the same number of workers
/// as in the paired scenarios, each member of the pair operates independently and processes
/// its own payload.
///
/// Note that this requires the benchmark scenario logic to be capable of handling its own data
/// set. If the benchmark logic requires two collaborating workers, you cannot use this work
/// distribution as it would likely end in a deadlock due to lack of a partner.
///
/// This mode is also unlikely to be informative if the scenario setup does not distinguish
/// between the two workers (they either do the same thing or pick dynamically who does what).
PinnedSelf,
/// Like `PinnedMemoryRegionPairs` but each worker is allowed to float among all the processors
/// in the memory region, based on the operating system's scheduling decisions.
///
/// We still have the same total number of workers to keep total system load equivalent.
UnpinnedMemoryRegionPairs,
/// Like `PinnedSameMemoryRegion` but each worker is allowed to float among half the processors
/// in the memory region, based on the operating system's scheduling decisions. Each member of
/// the pair gets one half of the processors in the memory region.
///
/// We still have the same total number of workers to keep total system load equivalent.
ConstrainedSameMemoryRegion,
/// All workers are spawned without regard to memory region or processor, leaving it up
/// to the operating system to decide where to run them. Note that, typically, this will still
/// result in them running in the same memory region, as that tends to be the default behavior.
///
/// Each worker is given back its own payload - while we still spawn the same number of workers
/// as in the paired scenarios, each member of the pair operates independently and processes
/// its own payload.
///
/// Note that this requires the benchmark scenario logic to be capable of handling its own data
/// set. If the benchmark logic requires two collaborating workers, you cannot use this work
/// distribution as it would likely end in a deadlock due to lack of a partner.
///
/// This mode is also unlikely to be informative if the scenario setup does not distinguish
/// between the two workers (they either do the same thing or pick dynamically who does what).
UnpinnedSelf,
/// All workers are spawned without regard to processor but in specific memory regions, matching
/// the memory regions used in `PinnedMemoryRegionPairs`. Each worker is allowed to float among
/// all the processors in the memory region, based on the operating system's scheduling
/// decisions.
///
/// Each worker is given back its own payload - while we still spawn the same number of workers
/// as in the paired scenarios, each member of the pair operates independently and processes
/// its own payload.
///
/// Note that this requires the benchmark scenario logic to be capable of handling its own data
/// set. If the benchmark logic requires two collaborating workers, you cannot use this work
/// distribution as it would likely end in a deadlock due to lack of a partner.
UnpinnedPerMemoryRegionSelf,
}
impl WorkDistribution {
/// All the work distribution modes.
#[must_use]
pub fn all() -> &'static [Self] {
&[
Self::PinnedMemoryRegionPairs,
Self::PinnedSameMemoryRegion,
Self::PinnedSameProcessor,
Self::PinnedSelf,
Self::UnpinnedMemoryRegionPairs,
Self::ConstrainedSameMemoryRegion,
Self::UnpinnedSelf,
Self::UnpinnedPerMemoryRegionSelf,
]
}
/// All the work distribution modes that exchange payloads between processors
/// before starting the benchmark.
#[must_use]
pub fn all_without_self() -> &'static [Self] {
&[
Self::PinnedMemoryRegionPairs,
Self::PinnedSameMemoryRegion,
Self::PinnedSameProcessor,
Self::UnpinnedMemoryRegionPairs,
Self::ConstrainedSameMemoryRegion,
]
}
/// All the work distribution modes that allow every worker to be placed
/// on a different processor (either explicitly or by allowing them to float
/// based on OS scheduling decisions).
#[must_use]
pub fn all_with_unique_processors() -> &'static [Self] {
&[
Self::PinnedMemoryRegionPairs,
Self::PinnedSameMemoryRegion,
Self::PinnedSelf,
Self::UnpinnedMemoryRegionPairs,
Self::ConstrainedSameMemoryRegion,
Self::UnpinnedSelf,
Self::UnpinnedPerMemoryRegionSelf,
]
}
/// All the work distribution modes that allow every worker to be placed
/// on a different processor (either explicitly or by allowing them to float
/// based on OS scheduling decisions) and exchange payloads between workers
/// before starting the benchmark.
#[must_use]
pub fn all_with_unique_processors_without_self() -> &'static [Self] {
&[
Self::PinnedMemoryRegionPairs,
Self::PinnedSameMemoryRegion,
Self::UnpinnedMemoryRegionPairs,
Self::ConstrainedSameMemoryRegion,
]
}
}
#[cfg(test)]
#[cfg_attr(coverage_nightly, coverage(off))]
mod tests {
use std::panic::{RefUnwindSafe, UnwindSafe};
use static_assertions::assert_impl_all;
use super::*;
assert_impl_all!(WorkDistribution: UnwindSafe, RefUnwindSafe);
}