region_local/lib.rs
1//! On many-processor systems with multiple memory regions, there is an extra cost associated with
2//! accessing data in physical memory modules that are in a different memory region than the current
3//! processor:
4//!
5//! * Cross-memory-region loads have higher latency (e.g. 100 ns local versus 200 ns remote).
6//! * Cross-memory-region loads have lower throughput (e.g. 200 GBps local versus 100 GBps remote).
7//!
8//! This crate enables you to create values that maintain separate storage per memory region.
9//!
10//! Region-local storage may be useful in circumstances where state needs to be shared but
11//! it is fine to do so only within each memory region (e.g. because you intentionally want
12//! to avoid the overhead of cross-memory-region transfers and want to isolate the data sets).
13//!
14#![doc = mermaid!("../doc/region_local.mermaid")]
15//!
16//! Think of this as an equivalent of [`thread_local_rc!`][2], except operating on the memory
17//! region boundary instead of the thread boundary.
18//!
19//! This is part of the [Folo project](https://github.com/folo-rs/folo) that provides mechanisms for
20//! high-performance hardware-aware programming in Rust.
21//!
22//! # Applicability
23//!
24//! A positive performance impact may be seen if all of the following conditions are true:
25//!
26//! 1. The system has multiple memory regions.
27//! 2. A shared data set is accessed from processors in different memory regions.
28//! 3. The data set is large enough to make it unlikely to be resident in local processor caches.
29//! 4. There is sufficient memory capacity to clone the data set into every memory region.
30//! 5. Callers in different memory regions do not need to see each other's writes.
31//!
32//! As with all performance and efficiency questions, you should use a profiler to measure real impact.
33//!
34//! # Usage
35//!
36//! There are two ways to create region-local values:
37//!
38//! 1. Define a static variable in a [`region_local!`][2] block.
39//! 2. Use the [`RegionLocal`][4] type inside a [`linked::InstancePerThread<T>`][6]
40//! or [`linked::InstancePerThreadSync<T>`][7] wrapper.
41//!
42//! The difference is only a question of convenience - static variables are easier to use but come
43//! with language-driven limitations, such as needing to know in advance how many you need and
44//! defining them in the code.
45//!
46//! In contrast, `linked::InstancePerThread<RegionLocal<T>>` is more flexible and you can create
47//! any number of instances at runtime, at a cost of having to manually deliver instances to
48//! the right place in the code.
49//!
50//! ## Usage via static variables
51//!
52//! This crate provides the [`region_local!`][3] macro that enhances static variables with
53//! region-local storage and provides interior mutability via weakly consistent
54//! writes within the same memory region.
55//!
56//! ```rust
57//! // RegionLocalExt provides required extension methods on region-local
58//! // static variables, such as `with_local()` and `set_local()`.
59//! use region_local::{region_local, RegionLocalExt};
60//!
61//! region_local!(static FAVORITE_COLOR: String = "blue".to_string());
62//!
63//! FAVORITE_COLOR.with_local(|color| {
64//! println!("My favorite color is {color}");
65//! });
66//!
67//! FAVORITE_COLOR.set_local("red".to_string());
68//! ```
69//!
70//! ## Usage via `InstancePerThreadSync<RegionLocal<T>>`
71//!
72//! There exist situations where a static variable is not suitable. For example, the number of
73//! different region-local objects may be determined at runtime (e.g. a separate value
74//! for each log source loaded from configuration).
75//!
76//! In this case, you can directly use the [`RegionLocal`][4] type which underpins the mechanisms
77//! exposed by the macro. This type is implemented using the [linked object pattern][3] and
78//! can be manually used via the [`InstancePerThread<T>`][6] or
79//! [`InstancePerThreadSync<T>`][7] wrapper type, as `InstancePerThreadSync<RegionLocal<T>>`.
80//!
81//! ```rust
82//! use linked::InstancePerThreadSync;
83//! use region_local::RegionLocal;
84//!
85//! let favorite_color_regional = InstancePerThreadSync::new(RegionLocal::new(|| "blue".to_string()));
86//!
87//! // This localizes the object to the current thread. Reuse this value when possible.
88//! let favorite_color = favorite_color_regional.acquire();
89//!
90//! favorite_color.with_local(|color| {
91//! println!("My favorite color is {color}");
92//! });
93//!
94//! favorite_color.set_local("red".to_string());
95//! ```
96//!
97//! See the documentation of the [`linked`][linked] crate for more details on the mechanisms
98//! offered by the linked object pattern. Additional capabilities exist beyond those described here.
99//!
100//! # Consistency guarantees
101//! [consistency-guarantees]: [#consistency-guarantees]
102//!
103//! Writes are weakly consistent within the same memory region, with an undefined order of resolving
104//! from different threads. Writes from the same thread become visible sequentially on all threads in
105//! the same memory region.
106//!
107//! Writes are immediately visible from the originating thread, with the caveats that:
108//! 1. Writes from other threads may be applied at any time, such as between
109//! a write and an immediately following read.
110//! 2. A thread, if not pinned, may migrate to a new memory region between the write and read
111//! operations, which invalidates any link between the two operations and will read from
112//! the storage of the new memory region.
113//!
114//! In general, you can only have firm expectations about the sequencing of data produced by read
115//! operations if the writes are always performed from a single thread per memory region and the
116//! thread is pinned to processors of only a single memory region.
117//!
118//! # Operating system compatibility
119//!
120//! This crate relies on the collaboration between the Rust global allocator and the operating
121//! system to map virtual memory pages to the correct memory region. The default configuration
122//! in operating systems tends to encourage region-local mapping but this is not guaranteed.
123//!
124//! Some evidence suggests that on Windows, region-local mapping is only enabled when the threads
125//! are pinned to specific processors in specific memory regions. A similar requirement is not known
126//! for Linux (at least Ubuntu 24) but this may differ based on the specific OS and configuration.
127//! Perform your own measurements to identify the behavior of your system and adjust the application
128//! structure accordingly.
129//!
130//! Example of using this crate with processor-pinned threads (`examples/region_local_1gb.rs`):
131//!
132//! ```
133//! # use std::{hint::black_box, thread, time::Duration};
134//! # use many_cpus::ProcessorSet;
135//! # use region_local::{RegionLocalExt, region_local};
136//! region_local! {
137//! // We allocate a 1 GB object in every memory region.
138//! // With 4 memory regions, you should see a total of 4 GB allocated.
139//! static DATA: Vec<u8> = vec![50; 1024 * 1024 * 1024];
140//! }
141//!
142//! fn main() {
143//! let processor_set = ProcessorSet::default();
144//!
145//! processor_set
146//! .spawn_threads(|_| DATA.with_local(|data| _ = black_box(data.len())))
147//! .into_iter()
148//! .for_each(|x| x.join().unwrap());
149//!
150//! println!(
151//! "All {} threads have accessed the region-local data. Terminating in 60 seconds.",
152//! processor_set.len()
153//! );
154//!
155//! # #[cfg(doc)] // Only for show, do not run when testing.
156//! thread::sleep(Duration::from_secs(60));
157//! }
158//! ```
159//!
160//! # Cross-region visibility
161//!
162//! The [`region_cached`][5] crate provides a similar mechanism that also publishes the value to all
163//! memory regions instead of keeping it region-local. This may be a useful alternative if you do
164//! not need to have separate variables per memory region but still want the efficiency benefits
165//! of reading from local memory.
166//!
167//! [1]: crate::RegionLocalExt
168//! [2]: https://doc.rust-lang.org/std/macro.thread_local.html
169//! [3]: crate::region_local
170//! [4]: crate::RegionLocal
171//! [5]: https://docs.rs/region_cached/latest/region_cached/
172//! [6]: linked::InstancePerThread
173//! [7]: linked::InstancePerThreadSync
174
175use simple_mermaid::mermaid;
176
177mod clients;
178mod macros;
179mod region_local;
180mod region_local_ext;
181
182pub(crate) use clients::*;
183pub use region_local::*;
184pub use region_local_ext::*;
185
186/// Macros require these things to be public but they are not part of the public API.
187#[doc(hidden)]
188pub mod __private;