hurdles 0.1.1

Counter-based thread barrier
Documentation

hurdles

Crates.io Documentation Build Status

A thread barrier like std::sync::Barrier with a more scalable implementation than the one provided by the standard library (which uses a Mutex). A barrier enables multiple threads to synchronize the beginning of some computation.

Examples

use hurdles::Barrier;
use std::thread;

let mut handles = Vec::with_capacity(10);
let mut barrier = Barrier::new(10);
for _ in 0..10 {
    let mut c = barrier.clone();
    // The same messages will be printed together.
    // You will NOT see any interleaving.
    handles.push(thread::spawn(move|| {
        println!("before wait");
        c.wait();
        println!("after wait");
    }));
}
// Wait for other threads to finish.
for handle in handles {
    handle.join().unwrap();
}

Implementation

This crate currently implements a counter-based linear barrier as described in "3.1 Centralized Barriers" in Mellor-Crummey and Scott’s paper Algorithms for scalable synchronization on shared-memory multiprocessors from 1991. For a higher-level explanation, see Lars-Dominik Braun's Introduction to barrier algorithms.

Numbers

Modern laptop with 2-core (4HT) Intel Core i7-5600U @ 2.60GHz:

test tests::ours_2 ... bench:         190 ns/iter (+/- 24)
test tests::std_2  ... bench:       2,054 ns/iter (+/- 822)
test tests::ours_4 ... bench:         236 ns/iter (+/- 2)
test tests::std_4  ... bench:      11,913 ns/iter (+/- 60)

Dell server with 2x 10-core (20HT) Intel Xeon E5-2660 v3 @ 2.60GHz across two NUMA nodes:

test tests::ours_4  ... bench:         689 ns/iter (+/- 9)
test tests::std_4   ... bench:       4,762 ns/iter (+/- 151)
test tests::ours_8  ... bench:       1,380 ns/iter (+/- 13)
test tests::std_8   ... bench:      17,545 ns/iter (+/- 288)
test tests::ours_16 ... bench:       2,970 ns/iter (+/- 33)
test tests::std_16  ... bench:      38,215 ns/iter (+/- 469)
test tests::ours_32 ... bench:       3,838 ns/iter (+/- 129)
test tests::std_32  ... bench:      94,266 ns/iter (+/- 12,243)

Or, in plot form:

Barrier time as the number of threads grow