Crate thread_map

Crate thread_map 

Source
Expand description

This library provides simple and easy-to-use alternatives to the std::thread_local macro and the thread_local crate.

Two main types are provided, ThreadMap and ThreadMapX, that have identical APIs but slightly different implementations that may be more or less efficient depending on the use case (see type ThreadMapX docs).

§Typical Usage Workflow

These are the steps typically followed when using this library:

  1. Instantiate either ThreadMap or ThreadMapX, wrap the instance in Arc, and name it tm for example.
  2. Spawn threads that enclose a clone of tm. If scoped threads are used, Arc is not required in the above step and instead a regular reference &tm can be used in the thread.
  3. Within each thread, read and/or modify the thread-local value by calling API methods on the tm clone or reference.
  4. Optionally, from the main thread, before the spawned threads terminate, inspect the thread-local values using the API.
  5. Optionally, from the main thread, once the spawned threads have terminated, inspect or extract all the thread-local values using the API.

§How It Differs From std::thread_local! and thread_local::ThreadLocal

While std::thread_local! and thread_local::ThreadLocal are optimized for efficiency, their usage can be more cumbersome in many cases. In particular, steps 4 and 5 above are not straightforward to do with these other thread-local approaches (but see thread_local_collect::tlm and thread_local_collect::tlcr for ways to do it).

Although it may seem that thread_local::ThreadLocal’s iter method provides a simple way to do items 4 and 5 above when the type parameter is Sync, it is important to note that ThreadLocal reuses its internal thread IDs for new threads when threads terminate (it does not use std::thread::ThreadId). Therefore, the thread-local values for some threads may not be preserved.

See below an example comparing the usage of std::thread_local! and ThreadMap.

§Depending on This Library

Add dependency in Cargo.toml:

[dependencies]
thread_map = "1"

§Usage Examples

See ThreadMap and ThreadMapX.

§Example Comparison With std::thread_local!

//! This example provides a direct comparison of the usage of [`ThreadMap`] and the
//! `std::thread_local!` macro.
//!
//! Lines that are specific to `ThreadMap` usage are preceded by
//! ```
//! //**ThreadMap**
//! ```
//!
//! and lines that are specific to `std::thread_local!` usage are preceded by
//! ```
//! //**std::thread_local**
//! ```

use std::{
    cell::Cell,
    sync::Arc,
    thread::{self},
    time::Duration,
};
use thread_map::ThreadMap;

const NTHREADS: i32 = 20;
const NITER: i32 = 10;
const SLEEP_MICROS: u64 = 10;

//**std::thread_local**
thread_local! {
    static TL: Cell<i32> = const {Cell::new(0)};
}

fn main() {
    //**ThreadMap**
    // There is no real need to wrap in `Arc` here because references can be used in scoped threads instead
    // of clones, but the `Arc` wrapper would be required if non-scoped threads were used.
    let tm = Arc::new(ThreadMap::default());

    thread::scope(|s| {
        for i in 0..NTHREADS {
            //**ThreadMap**
            let tm = tm.clone();

            s.spawn(move || {
                for _ in 0..NITER {
                    thread::sleep(Duration::from_micros(SLEEP_MICROS));

                    //**ThreadMap**
                    tm.with_mut(move |i0: &mut i32| *i0 += i);

                    //**std::thread_local**
                    TL.with(move |i0: &Cell<i32>| i0.replace(i0.get() + i));
                }

                //**ThreadMap**
                {
                    let value = tm.get();
                    assert_eq!(i * NITER, value);
                }

                //**std::thread_local**
                {
                    let value = TL.with(Cell::get);
                    assert_eq!(i * NITER, value);
                }
            });
        }

        //**ThreadMap**
        {
            // Snapshot before thread-local value in main thread is updated.
            let probed = tm.probe().unwrap();
            println!("probed={probed:?}");
        }

        //**std::thread_local**
        {
            // Can't do something similar to the above block
        }

        //**ThreadMap**
        for _ in 0..NITER {
            tm.with_mut(|i0: &mut i32| *i0 += NTHREADS)
        }

        //**std::thread_local**
        for _ in 0..NITER {
            TL.with(|i0: &Cell<i32>| i0.replace(i0.get() + NTHREADS));
        }

        //**ThreadMap**
        {
            // Snapshot before all scoped threads terminate.
            let probed = tm.probe().unwrap();
            println!("\nprobed={probed:?}");
        }

        //**std::thread_local**
        {
            // Can't do something similar to the above block
        }
    });

    //**ThreadMap**
    {
        // Snapshot after all scoped threads terminate.
        let probed = tm.probe().unwrap();
        println!("\nprobed={probed:?}");

        let expected_sum = (0..=NTHREADS).map(|i| i * NITER).sum::<i32>();
        let sum = tm.fold_values(0, |z, v| z + v).unwrap();
        assert_eq!(expected_sum, sum);

        // Extracted values after all scoped threads terminate.
        let dumped = tm.drain().unwrap();
        println!("\ndumped={dumped:?}");
    }

    //**std::thread_local**
    {
        // Can't do something similar to the above block
    }
}

§Benchmarks

The benchmarks use the bench_diff, which supports reliable latency comparison between closures.

A fairly typical scenario was defined for the comparisons. The scenario defines a closure that spawns 5 threads and executes methods of the struct, as follows:

  • There were two scenario variants.
  • The following parameters were common to the two variants:
    • Number of threads: 5.
    • Number of iterations within each thread: 2000.
    • One write operation and one read operation per iteration.
  • The two variants differed as follows:
    • Variant 1 – The sum of the values in all thread buckets was computed on each thread every 100 iterations.
    • Variant 2 – The sum of the values in all thread buckets was computed on each thread every 50 iterations.

ThreadMap vs. ThreadMapX

  • Each of the scenario variants was run 100 times, each with a sample size of 1000 executions of each closure.
  • For Variant 1, there was a mix of results. One would conclude that the latency of the ThreadMap closure tended to be somewhat lower but was not consistently statistically significantly lower than for ThreadMapX. (bench_diff substantially mitigates the problem of time-dependent noise in latency comparisons, but it does eliminate the issue.) Latencies were around 3ms.
  • For Variant 2, the situation was reversed, with ThreadMap tending to be slower (higher latencies) than ThreadMapX. Latencies were slightly higher but still around 3ms.
  • On can conclude that the frequency of sweep opearations (operations that combine data from each thread) has an impact on the relative efficiency of the two structs. ThreadMap tends to be faster than ThreadMapX when sweep operations are infrequent, and slower when sweep operations are frequent.

ThreadMap vs. ThreadLocal

  • As above, each of the scenario variants was run 100 times, each with a sample size of 1000 executions of each closure.
  • As discussed in an earlier section, ThreadLocal is optimized for speed but its use requires care as its internal thread IDs are reused (unlike Rust’s standard ThreadId).
  • In all cases, the latency of the ThreadMap closure was substantially higher, approximately 5-6 times as high as the latency of the ThreadLocal closure. As before, the latencies for the ThreadMap runs were around 3ms. For the ThreadLocal runs, the latencies were around 500μs.
  • On can conclude that:
    • For performance-sensitive applications, where the data structure is accessed frequently on many threads, ThreadLocal would be a good choice, with the caveat (discussed earlier) about the impact of its reuse of internal thread IDs.
    • For applications where the data structure is not as heavily accessed, ThreadMap or ThreadMapX can provide a convenient, more ergonomic alternative.

For an alternative that takes advantage of the efficiency of ThreadLocal while addressing the above-mentioned caveat, consider using crate thread_local_collect.

Modules§

thread_map
For backward compatibility only and eventually may be deprecated. The library’s structs are now available directly at top level.

Structs§

ThreadMap
This type encapsulates the association of ThreadIds to values of type V. It is a simple and easy-to-use alternative to the std::thread_local macro and the thread_local crate.
ThreadMapLockError
Error emitted by some ThreadMap and ThreadMapX methods when the object-level internal lock is poisoned.
ThreadMapX
Like ThreadMap, this type encapsulates the association of ThreadIds to values of type V and is a simple and easy-to-use alternative to the std::thread_local macro and the thread_local crate. It differs from ThreadMap in that it contains a Mutex for each value, allowing the methods Self::fold, Self::fold_values, and Self::probe to run more efficiently when there are concurrent calls to the per-thread methods (Self::with, Self::with_mut, Self::get, Self::set) by using fine-grained per-thread locking instead of acquiring an object-level write lock. On the other hand, the per-thread methods may run a bit slower as they require the acquision of the per-thread lock.