Crate burst_pool [] [src]

A thread pool optimised for bursts of activity.

Consider the following use-case: A single thread produces work which must then be performed by a pool of worker threads. The work is produced infrequently, but in bursts. Under normal operation, therefore, the threads in the pool sleep until some event requires many of them to be suddenly be woken at once. Those threads perform some work before going back to sleep again.

Most thread pools schedule work based on thread readiness. This invariably means that work is pushed onto a single queue which is shared by all the workers, and the workers steal work from the queue when they're ready to accept it. This usually works very well, but not in our use-case: since many threads become runnable at the same time, they immediately contend to read from the queue.

Instead, we use a round-robin scheduling strategy, in which the work-producing thread sends work to specific workers. This eliminates contention (since each worker has its own queue). The trade-off is that it performs badly when utilisation is high and the workloads are uneven.

Usage

Normally a thread pool will spawn a fixed number of worker threads; once the threads are running, you send closures to the pool which are then executed by one of the workers. BurstPool's API works a bit differently.

A BurstPool is parameterised over the kind of data which will be sent to the workers. (This data could be boxed closures, if you want to mimic the normal API.) The user provides work by calling BurstPool::send(), and one of the workers is chosen to receive it. Each worker has its own function which it uses to handle work sent to it. This function is provided when the thread is spawned.

use burst_pool::BurstPool;

let mut pool = BurstPool::new();

// Spawn a worker in the pool
pool.spawn(|x| println!("Received {}!", x));

// Send some work to the worker
pool.send(36).unwrap();

Performance

I'm using the following benchmark:

  1. Create a pool with 10 workers.
  2. Send 10 pieces of work to the pool.
  3. When a thread recieves some work it reads the clock, records its latency, and goes back to sleep.

For comparison, the benchmark was replicated using some of the other thread pools on crates.io (which are not optimised for this use-case), and I also benchmarked the time taken to suddenly unpark 10 parked threads.

crate mean latency 20 %ⁱˡᵉ 40 %ⁱˡᵉ 60 %ⁱˡᵉ 80 %ⁱˡᵉ
burst_pool 8.3 μs <3.4 μs <5.1 μs <7.6 μs <7.6 μs
threadpool 17.4 μs <7.6 μs <11.4 μs <17.1 μs <17.1 μs
scoped_threadpool 18.7 μs <7.6 μs <11.4 μs <17.1 μs <17.1 μs
(unpark only) 7.7 μs <3.4 μs <5.1 μs <7.6 μs <7.6 μs
  • The profile of BurstPool shows that it doesn't add much latency over just calling unpark(). Almost all of the time goes to the linux scheduler.
  • The mean latency is heavily skewed by outliers, lying above the 80th percentile. You can expect latencies better than 7.6 μs most of the time, with the occasional long wait.
  • Getting results as good as these is heavily dependent on setting max_cstate = 0 - this makes a huge difference to thread wake-up times.

You can run the benchmarks yourself with cargo bench. If your results are significantly worse than those above, your kernel might be powering down CPU cores too eagerly. If you care about latency more than battery life, consider setting max_cstate = 0.

Structs

BurstPool

A thread pool optimised for bursts of activity.

Enums

BurstError