1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
//! Working on many-processor systems with 100+ logical processors can require you to pay extra
//! attention to the specifics of the hardware to make optimal use of available compute capacity
//! and extract the most performance out of the system.
//!
//! This is part of the [Folo project](https://github.com/folo-rs/folo) that provides mechanisms for
//! high-performance hardware-aware programming in Rust.
//!
//! # Why should one care?
//!
//! Modern operating systems try to distribute work fairly between all processors. Typical Rust
//! sync and async task runtimes like Rayon and Tokio likewise try to be efficient in occupying all
//! processors with work, even moving work between processors if one risks becoming idle. This is fine
//! but we can do better.
//!
//! Taking direct control over the placement of work on specific processors can yield superior
//! performance by taking advantage of factors under the service author's control, which are not known
//! to general-purpose tasking runtimes:
//!
//! 1. A key insight we can use is that most service apps exist to process requests or execute jobs - each
//! unit of work being done is related to a specific data set. We can ensure we only process the data
//! associated with a specific HTTP/gRPC request on a single processor to ensure optimal data locality.
//! This means the data related to the request is likely to be in the caches of that processor, speeding
//! up all operations related to that request by avoiding expensive memory accesses.
//! 1. Even when data is intentionally shared across processors (e.g. because one processor is not capable
//! enough to do the work and parallelization is required), performance differences exist between
//! different pairs of processors because different processors can be connected to different physical
//! memory modules. Access to non-cached data is optimal when that data is in the same memory region
//! as the current processor (i.e. on the physical memory modules directly wired to the current
//! processor).
//!
//! # How does this package help?
//!
//! The `many_cpus` package provides mechanisms to schedule threads on specific processors and in specific
//! memory regions, ensuring that work assigned to those threads remains on the same hardware and that
//! data shared between threads is local to the same memory region, enabling you to achieve high data
//! locality and processor cache efficiency.
//!
//! In addition to thread spawning, this package enables app logic to observe what processor the current
//! thread is executing on and in which memory region this processor is located, even if the thread is
//! not bound to a specific processor. This can be a building block for efficiency improvements even
//! outside directly controlled work scheduling.
//!
//! Other packages from the [Folo project](https://github.com/folo-rs/folo) build upon this hardware-
//! awareness functionality to provide higher-level primitives such as thread pools, work schedulers,
//! region-local cells and more.
//!
//! # Quick start
//!
//! The simplest scenario is when you want to start a thread on every processor in the default
//! processor set:
//!
//! ```rust
//! // examples/spawn_on_all_processors.rs
//! # use many_cpus::SystemHardware;
//! let threads = SystemHardware::current()
//! .processors()
//! .spawn_threads(|processor| {
//! println!("Spawned thread on processor {}", processor.id());
//!
//! // In a real service, you would start some work handler here, e.g. to read
//! // and process messages from a channel or to spawn a web handler.
//! });
//! # for thread in threads {
//! # thread.join().unwrap();
//! # }
//! ```
//!
//! If there are no operating system enforced constraints active, the default processor set
//! includes all processors.
//!
//! # Selection criteria
//!
//! Depending on the specific circumstances, you may want to filter the set of processors.
//! For example, you may want to use only two processors but ensure that they are high-performance
//! processors that are connected to the same physical memory modules so they can cooperatively
//! perform some processing on a shared data set:
//!
//! ```rust
//! // examples/spawn_on_selected_processors.rs
//! # use many_cpus::SystemHardware;
//! # use new_zealand::nz;
//! let hardware = SystemHardware::current();
//!
//! let selected_processors = hardware
//! .processors()
//! .to_builder()
//! .same_memory_region()
//! .performance_processors_only()
//! .take(nz!(2))
//! // If we do not have what we want, we fall back to the default set.
//! .unwrap_or_else(|| hardware.processors());
//!
//! let threads = selected_processors.spawn_threads(|processor| {
//! println!("Spawned thread on processor {}", processor.id());
//!
//! // In a real service, you would start some work handler here, e.g. to read
//! // and process messages from a channel or to spawn a web handler.
//! });
//! # for thread in threads {
//! # thread.join().unwrap();
//! # }
//! ```
//!
//! # Inspecting the hardware environment
//!
//! Functions are provided to easily inspect the current hardware environment:
//!
//! ```rust
//! // examples/observe_processor.rs
//! # use many_cpus::SystemHardware;
//! # use std::{thread, time::Duration};
//! let hardware = SystemHardware::current();
//!
//! let max_processors = hardware.max_processor_count();
//! let max_memory_regions = hardware.max_memory_region_count();
//! println!(
//! "This system can support up to {max_processors} processors in {max_memory_regions} memory regions"
//! );
//!
//! loop {
//! let current_processor_id = hardware.current_processor_id();
//! let current_memory_region_id = hardware.current_memory_region_id();
//!
//! println!(
//! "Thread executing on processor {current_processor_id} in memory region {current_memory_region_id}"
//! );
//!
//! # #[cfg(doc)] // Skip the sleep when testing.
//! thread::sleep(Duration::from_secs(1));
//! # #[cfg(not(doc))] // Break after first iteration when testing.
//! # break;
//! }
//! ```
//!
//! Note that the current processor may change at any time if you are not using threads pinned to
//! specific processors (such as those spawned via `ProcessorSet::spawn_threads()`). Example output:
//!
//! ```text
//! This system can support up to 32 processors in 1 memory regions
//! Thread executing on processor 4 in memory region 0
//! Thread executing on processor 4 in memory region 0
//! Thread executing on processor 12 in memory region 0
//! Thread executing on processor 2 in memory region 0
//! Thread executing on processor 12 in memory region 0
//! Thread executing on processor 0 in memory region 0
//! Thread executing on processor 4 in memory region 0
//! Thread executing on processor 4 in memory region 0
//! ```
//! # External constraints
//!
//! The operating system may define constraints that prohibit the application from using all
//! the available processors (e.g. when the app is containerized and provided limited
//! hardware resources).
//!
//! This package treats platform constraints as follows:
//!
//! * Hard limits on which processors are allowed are respected - forbidden processors are mostly
//! ignored by this package and cannot be used to spawn threads, though such processors are still
//! accounted for when inspecting hardware information such as "max processor ID".
//! The mechanisms for defining such limits are cgroups on Linux and job objects on Windows.
//! See `examples/obey_job_affinity_limits_windows.rs` for a Windows-specific example.
//! * Soft limits on which processors are allowed are ignored by default - specifying a processor
//! affinity via `taskset` on Linux, `start.exe /affinity 0xff` on Windows or similar mechanisms
//! does not affect the set of processors this package will use by default, though you can opt in to
//! this via [`.where_available_for_current_thread()`][crate::ProcessorSetBuilder::where_available_for_current_thread].
//! * Any operating system enforced processor time quota is taken as the upper bound for the processor
//! count of the processor set returned by [`SystemHardware::processors()`].
//! * Any other processor set can be opt-in quota-limited when building the processor set. For example, by calling `SystemHardware::current().all_processors().to_builder().enforce_resource_quota().take_all()`.
//!
//! See `examples/obey_job_resource_quota_limits_windows.rs` for a Windows-specific example of processor
//! time quota enforcement.
//!
//! # Avoiding operating system quota penalties
//!
//! If a process exceeds the processor time limit, the operating system will delay executing the
//! process further until the "debt is paid off". This is undesirable for most workloads because:
//!
//! 1. There will be random latency spikes from when the operating system decides to apply a delay.
//! 1. The delay may not be evenly applied across all threads of the process, leading to unbalanced
//! load between worker threads.
//!
//! For predictable behavior that does not suffer from delay side-effects, it is important that the
//! process does not exceed the processor time limit. To keep out of trouble,
//! follow these guidelines:
//!
//! * Ensure that all your concurrently executing thread pools are derived from the same processor
//! set, so there is a single set of processors (up to the resource quota) that all work of the
//! process will be executed on. Any new processor sets you create should be subsets of this set,
//! thereby ensuring that all worker threads combined do not exceed the quota.
//! * Ensure that the original processor set is constructed while obeying the resource quota (which is
//! enabled by default).
//!
//! If your resource constraints are already applied on process startup, you can use
//! `SystemHardware::current().processors()` as the master set from which all other
//! processor sets are derived using either `.take()` or `.to_builder()`. This will ensure the
//! processor time quota is obeyed because `processors()` is size-limited to the quota.
//!
//! ```rust
//! # use many_cpus::SystemHardware;
//! # use new_zealand::nz;
//! let hw = SystemHardware::current();
//!
//! // By taking both senders and receivers from the same original processor set, we
//! // guarantee that all worker threads combined cannot exceed the processor time quota.
//! let mail_senders = hw
//! .processors()
//! .take(nz!(2))
//! .expect("need at least 2 processors for mail workers")
//! .spawn_threads(|_| send_mail());
//!
//! let mail_receivers = hw
//! .processors()
//! .take(nz!(2))
//! .expect("need at least 2 processors for mail workers")
//! .spawn_threads(|_| receive_mail());
//! # fn send_mail() {}
//! # fn receive_mail() {}
//! ```
//!
//! # Changes at runtime
//!
//! It is possible that a system will have processors added or removed at runtime, or for
//! constraints enforced by the operating system to change over time. Such changes will not be
//! represented in an existing processor set - once created, a processor set is static.
//!
//! Changes to resource quotas can be applied by creating a new processor set (e.g. if the
//! processor time quota is lowered, building a new set will by default use the new quota).
//!
//! This package will not detect more fundamental changes such as added/removed processors. Operations
//! attempted on removed processors may fail with an error or panic or silently misbehave (e.g.
//! threads never starting). Added processors will not be considered a member of any set.
//!
//! # Inheriting soft limits on allowed processors
//!
//! While the package does not by default obey soft limits, you can opt in to these limits by
//! inheriting the allowed processor set in the `main()` entrypoint thread:
//!
//! ```rust
//! // examples/spawn_on_inherited_processors.rs
//! # use std::{thread, time::Duration};
//! # use many_cpus::SystemHardware;
//! let hardware = SystemHardware::current();
//!
//! // The set of processors used here can be adjusted via OS mechanisms.
//! //
//! // For example, to select only processors 0 and 1:
//! // Linux: `taskset 0x3 target/debug/examples/spawn_on_inherited_processors`
//! // Windows: `start /affinity 0x3 target/debug/examples/spawn_on_inherited_processors.exe`
//! let inherited_processors = hardware
//! .processors()
//! .to_builder()
//! // This causes soft limits on processor affinity to be respected.
//! .where_available_for_current_thread()
//! .take_all()
//! .expect(
//! "found no processors usable by the current thread; \
//! this is impossible because the thread is currently running on one",
//! );
//!
//! println!(
//! "After applying soft limits, we are allowed to use {} processors.",
//! inherited_processors.len()
//! );
//!
//! let threads = inherited_processors.spawn_threads(|processor| {
//! println!("Spawned thread on processor {}", processor.id());
//!
//! // In a real service, you would start some work handler here, e.g. to read
//! // and process messages from a channel or to spawn a web handler.
//! });
//! # for thread in threads {
//! # thread.join().unwrap();
//! # }
//! ```
//!
//! # Testing with fake hardware
//!
//! The `many_cpus` package provides a fake hardware capability for testing code that depends on
//! hardware configuration. This is available when the `test-util` Cargo feature is enabled.
//!
//! To make your code testable with fake hardware, accept `SystemHardware` as a value (typically
//! as a function parameter or struct field) instead of always calling `SystemHardware::current()`.
//! This allows tests to substitute fake hardware while production code uses real hardware.
//!
//! See the [`fake`] module for detailed examples and API documentation.
//!
//! # Operating system compatibility
//!
//! This package is tested on the following operating systems:
//!
//! * Windows 11 and newer
//! * Windows Server 2022 and newer
//! * Ubuntu 24.04 and newer
//!
//! The functionality may also work on other operating systems if they offer compatible platform
//! APIs but this is not actively tested.
//!
//! ## Unsupported platforms
//!
//! On operating systems without native support (such as macOS, BSD variants, etc.), this package
//! provides a fallback implementation that allows code to compile and run with graceful degradation:
//!
//! * Processor count is determined via `std::thread::available_parallelism()`
//! * All processors are simulated as being in a single memory region (region 0)
//! * All processors are marked as Performance class
//! * Thread pinning operations succeed but do not actually pin threads to processors
//! * Current processor tracking uses stable thread-local IDs derived from thread IDs
//!
//! While this fallback behavior maintains API compatibility and allows applications to function,
//! it does not provide the performance benefits of actual processor pinning and topology awareness.
//! Applications running on unsupported platforms will not see performance improvements from using
//! this package but will still function correctly.
//!
//! ## Miri
//!
//! When running under [Miri](https://github.com/rust-lang/miri), this package uses the same
//! fallback implementation as unsupported platforms because Miri cannot execute the platform-
//! specific system calls used by the native implementations. This means `SystemHardware::current()`
//! returns fallback hardware under Miri, enabling dependent crates to run their test suites under
//! Miri without any special handling as long as the test logic is compatible with the fallback behavior.
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;
// No documented public API but we have benchmarks that reach in via undocumented private API.