Zero-Pool: Consistent High-Performance Thread Pool
When microseconds matter and allocation is the enemy.
This is an experimental thread pool implementation focused on exploring lock-free MPMC queue techniques and zero-allocation task dispatch. Consider this a performance playground rather than a production-ready library.
Key Features:
- 16 bytes per task - minimal memory footprint per work item
- Zero locks - lock free queue
- Zero queue limit - unbounded
- Zero virtual dispatch - function pointer dispatch avoids vtable lookups
- Zero core spinning - event based
- Zero result transport cost - tasks write directly to caller-provided memory
- Zero per worker queues - single global queue structure
- Zero external dependencies - standard library and stable rust only
Workers are only passed 16 bytes per work item, a function pointer and a struct pointer. Using a result-via-parameters pattern means workers place results into caller provided memory, removing thread transport overhead. The single global queue structure ensures optimal load balancing without the complexity of work-stealing or load redistribution algorithms.
Since the library uses raw pointers, you must ensure parameter structs remain valid until TaskFuture::wait() completes, result pointers remain valid until task completion, and that your task functions are thread-safe. The library provides type-safe methods like submit_task and submit_batch_uniform for convenient usage.
Benchmarks
test bench_heavy_compute_rayon ... bench: 4,844,119.25 ns/iter
test bench_heavy_compute_rayon_optimised ... bench: 4,935,556.95 ns/iter
test bench_heavy_compute_zeropool ... bench: 4,390,880.40 ns/iter
test bench_heavy_compute_zeropool_optimised ... bench: 4,407,382.45 ns/iter
test bench_indexed_computation_rayon ... bench: 39,135.11 ns/iter
test bench_indexed_computation_rayon_optimised ... bench: 34,639.97 ns/iter
test bench_indexed_computation_zeropool ... bench: 50,064.12 ns/iter
test bench_indexed_computation_zeropool_optimised ... bench: 40,170.21 ns/iter
test bench_task_overhead_rayon ... bench: 39,940.40 ns/iter
test bench_task_overhead_rayon_optimised ... bench: 40,994.87 ns/iter
test bench_task_overhead_zeropool ... bench: 50,517.70 ns/iter
test bench_task_overhead_zeropool_optimised ... bench: 45,036.93 ns/iter
Example Usage
Recommended: Use the type-erasing macros for safe and convenient task submission:
zp_task_params!
Creates a task parameter struct with an automatic constructor.
use zp_task_params;
zp_task_params!
// Usage: SumParams::new(1000, &mut result)
zp_define_task_fn!
Defines a task function that safely dereferences the parameter struct.
use ;
zp_define_task_fn!;
zp_write!
Optional macro that eliminates explicit unsafe blocks when writing to params struct.
use ;
zp_define_task_fn!;
zp_write_indexed!
Safely writes a value to a specific index in a Vec or array via raw pointer, useful for batch processing where each task writes to a different index.
use ;
zp_task_params!
zp_define_task_fn!;
// Usage with a pre-allocated vector
let pool = new;
let mut results = vec!;
let tasks: = .map.collect;
let future = pool.submit_batch_uniform;
future.wait;
Submitting a Single Task
use ;
zp_task_params!
zp_define_task_fn!;
let pool = new;
let mut result = 0u64;
let task = new;
let future = pool.submit_task;
future.wait;
println!;
Submitting Uniform Batches
Submits multiple tasks of the same type to the thread pool.
use ;
zp_task_params!
zp_define_task_fn!;
let pool = new;
let mut results = vec!;
let tasks: = results.iter_mut.enumerate.map.collect;
let future = pool.submit_batch_uniform;
future.wait;
println!;
zp_submit_batch_mixed!
Submits multiple tasks of different types to the thread pool.
use ;
// First task type
zp_task_params!
zp_define_task_fn!;
// Second task type
zp_task_params!
zp_define_task_fn!;
let pool = new;
let mut add_result = 0u64;
let mut multiply_result = 0u64;
let add = new;
let multiply = new;
let future = zp_submit_batch_mixed!;
future.wait;
println!;
println!;
Submitting Multiple Independent Tasks
You can submit individual tasks, uniform batches, and mixed batches in parallel:
use ;
// Define task types (assuming compute_task is already defined)
zp_task_params!
zp_define_task_fn!;
let pool = new;
// Individual tasks - separate memory locations
let mut single_result = 0u64;
let single_task = new;
// Uniform batch - separate memory from above
let mut batch_results = vec!;
let batch_tasks: = batch_results.iter_mut.enumerate
.map
.collect;
// Mixed batch - separate memory from above
let mut add_result = 0u64;
let mut multiply_result = 0u64;
let compute_mixed = new;
let multiply_mixed = new;
// Submit all work immediately (all start executing in parallel)
let future1 = pool.submit_task;
let future2 = pool.submit_batch_uniform;
let future3 = zp_submit_batch_mixed!;
// Wait on them in any order - they're all running independently
future1.wait;
future2.wait;
future3.wait;
println!;
println!;
println!;
Performance Optimization: Pre-converted Tasks
For hot paths where you submit the same tasks repeatedly, you can pre-convert tasks to avoid repeated pointer conversions:
let pool = new;
let mut results = vec!;
let tasks: = results.iter_mut.map.collect;
// Convert once, reuse multiple times
let tasks_converted = uniform_tasks_to_pointers;
// Submit multiple batches with zero conversion overhead
let futures: = .map.collect;
// Submit multiple batches with zero conversion overhead
for _ in 0..3