Expand description
This library provides simple and easy-to-use alternatives to the std::thread_local macro and the thread_local crate.
Two main types are provided, ThreadMap and ThreadMapX, that have identical APIs but slightly different implementations that may be more or less efficient depending on the use case (see type ThreadMapX docs).
§Typical Usage Workflow
These are the steps typically followed when using this library:
- Instantiate either
ThreadMaporThreadMapX, wrap the instance inArc, and name ittmfor example. - Spawn threads that enclose a clone of
tm. If scoped threads are used,Arcis not required in the above step and instead a regular reference&tmcan be used in the thread. - Within each thread, read and/or modify the thread-local value by calling API methods on the
tmclone or reference. - Optionally, from the main thread, before the spawned threads terminate, inspect the thread-local values using the API.
- Optionally, from the main thread, once the spawned threads have terminated, inspect or extract all the thread-local values using the API.
§How It Differs From std::thread_local! and thread_local::ThreadLocal
While std::thread_local! and thread_local::ThreadLocal are optimized for efficiency, their usage can be more cumbersome in many cases. In particular, steps 4 and 5 above are not straightforward to do with these other thread-local approaches (but see thread_local_collect::tlm and thread_local_collect::tlcr for ways to do it).
Although it may seem that thread_local::ThreadLocal’s iter method provides a simple way to do items 4 and 5 above when the type parameter is Sync, it is important to note that ThreadLocal reuses its internal thread IDs for new threads when threads terminate (it does not use std::thread::ThreadId). Therefore, the thread-local values for some threads may not be preserved.
See below an example comparing the usage of std::thread_local! and ThreadMap.
§Depending on This Library
Add dependency in Cargo.toml:
[dependencies]
thread_map = "1"§Usage Examples
See ThreadMap and ThreadMapX.
§Example Comparison With std::thread_local!
//! This example provides a direct comparison of the usage of [`ThreadMap`] and the
//! `std::thread_local!` macro.
//!
//! Lines that are specific to `ThreadMap` usage are preceded by
//! ```
//! //**ThreadMap**
//! ```
//!
//! and lines that are specific to `std::thread_local!` usage are preceded by
//! ```
//! //**std::thread_local**
//! ```
use std::{
cell::Cell,
sync::Arc,
thread::{self},
time::Duration,
};
use thread_map::ThreadMap;
const NTHREADS: i32 = 20;
const NITER: i32 = 10;
const SLEEP_MICROS: u64 = 10;
//**std::thread_local**
thread_local! {
static TL: Cell<i32> = const {Cell::new(0)};
}
fn main() {
//**ThreadMap**
// There is no real need to wrap in `Arc` here because references can be used in scoped threads instead
// of clones, but the `Arc` wrapper would be required if non-scoped threads were used.
let tm = Arc::new(ThreadMap::default());
thread::scope(|s| {
for i in 0..NTHREADS {
//**ThreadMap**
let tm = tm.clone();
s.spawn(move || {
for _ in 0..NITER {
thread::sleep(Duration::from_micros(SLEEP_MICROS));
//**ThreadMap**
tm.with_mut(move |i0: &mut i32| *i0 += i);
//**std::thread_local**
TL.with(move |i0: &Cell<i32>| i0.replace(i0.get() + i));
}
//**ThreadMap**
{
let value = tm.get();
assert_eq!(i * NITER, value);
}
//**std::thread_local**
{
let value = TL.with(Cell::get);
assert_eq!(i * NITER, value);
}
});
}
//**ThreadMap**
{
// Snapshot before thread-local value in main thread is updated.
let probed = tm.probe().unwrap();
println!("probed={probed:?}");
}
//**std::thread_local**
{
// Can't do something similar to the above block
}
//**ThreadMap**
for _ in 0..NITER {
tm.with_mut(|i0: &mut i32| *i0 += NTHREADS)
}
//**std::thread_local**
for _ in 0..NITER {
TL.with(|i0: &Cell<i32>| i0.replace(i0.get() + NTHREADS));
}
//**ThreadMap**
{
// Snapshot before all scoped threads terminate.
let probed = tm.probe().unwrap();
println!("\nprobed={probed:?}");
}
//**std::thread_local**
{
// Can't do something similar to the above block
}
});
//**ThreadMap**
{
// Snapshot after all scoped threads terminate.
let probed = tm.probe().unwrap();
println!("\nprobed={probed:?}");
let expected_sum = (0..=NTHREADS).map(|i| i * NITER).sum::<i32>();
let sum = tm.fold_values(0, |z, v| z + v).unwrap();
assert_eq!(expected_sum, sum);
// Extracted values after all scoped threads terminate.
let dumped = tm.drain().unwrap();
println!("\ndumped={dumped:?}");
}
//**std::thread_local**
{
// Can't do something similar to the above block
}
}§Benchmarks
The benchmarks use the bench_diff, which supports reliable latency comparison between closures.
A fairly typical scenario was defined for the comparisons. The scenario defines a closure that spawns 5 threads and executes methods of the struct, as follows:
- There were two scenario variants.
- The following parameters were common to the two variants:
- Number of threads: 5.
- Number of iterations within each thread: 2000.
- One write operation and one read operation per iteration.
- The two variants differed as follows:
- Variant 1 – The sum of the values in all thread buckets was computed on each thread every 100 iterations.
- Variant 2 – The sum of the values in all thread buckets was computed on each thread every 50 iterations.
ThreadMap vs. ThreadMapX
- Each of the scenario variants was run 100 times, each with a sample size of 1000 executions of each closure.
- For Variant 1, there was a mix of results. One would conclude that the latency of the
ThreadMapclosure tended to be somewhat lower but was not consistently statistically significantly lower than forThreadMapX. (bench_diffsubstantially mitigates the problem of time-dependent noise in latency comparisons, but it does eliminate the issue.) Latencies were around 3ms. - For Variant 2, the situation was reversed, with
ThreadMaptending to be slower (higher latencies) thanThreadMapX. Latencies were slightly higher but still around 3ms. - On can conclude that the frequency of sweep opearations (operations that combine data from each thread) has an impact on the relative efficiency of the two structs.
ThreadMaptends to be faster thanThreadMapXwhen sweep operations are infrequent, and slower when sweep operations are frequent.
ThreadMap vs. ThreadLocal
- As above, each of the scenario variants was run 100 times, each with a sample size of 1000 executions of each closure.
- As discussed in an earlier section,
ThreadLocalis optimized for speed but its use requires care as its internal thread IDs are reused (unlike Rust’s standardThreadId). - In all cases, the latency of the
ThreadMapclosure was substantially higher, approximately 5-6 times as high as the latency of theThreadLocalclosure. As before, the latencies for theThreadMapruns were around 3ms. For theThreadLocalruns, the latencies were around 500μs. - On can conclude that:
- For performance-sensitive applications, where the data structure is accessed frequently on many threads,
ThreadLocalwould be a good choice, with the caveat (discussed earlier) about the impact of its reuse of internal thread IDs. - For applications where the data structure is not as heavily accessed,
ThreadMaporThreadMapXcan provide a convenient, more ergonomic alternative.
- For performance-sensitive applications, where the data structure is accessed frequently on many threads,
For an alternative that takes advantage of the efficiency of ThreadLocal while addressing the above-mentioned caveat, consider using crate thread_local_collect.
Modules§
- thread_
map - For backward compatibility only and eventually may be deprecated. The library’s structs are now available directly at top level.
Structs§
- Thread
Map - This type encapsulates the association of
ThreadIds to values of typeV. It is a simple and easy-to-use alternative to thestd::thread_localmacro and thethread_localcrate. - Thread
MapLock Error - Error emitted by some
ThreadMapandThreadMapXmethods when the object-level internal lock is poisoned. - Thread
MapX - Like
ThreadMap, this type encapsulates the association ofThreadIds to values of typeVand is a simple and easy-to-use alternative to thestd::thread_localmacro and thethread_localcrate. It differs fromThreadMapin that it contains aMutexfor each value, allowing the methodsSelf::fold,Self::fold_values, andSelf::probeto run more efficiently when there are concurrent calls to the per-thread methods (Self::with,Self::with_mut,Self::get,Self::set) by using fine-grained per-thread locking instead of acquiring an object-level write lock. On the other hand, the per-thread methods may run a bit slower as they require the acquision of the per-thread lock.