Expand description
Parallel execution support for multi-threaded FFT computation.
Provides a threading abstraction layer with optional Rayon integration for parallelizing FFT operations across multiple CPU cores.
§Overview
OxiFFT’s threading system provides:
- Pluggable thread pools via the
ThreadPooltrait - Automatic parallelization of batch and multi-dimensional FFTs
- Zero-cost abstraction when threading is disabled
§Feature Flag
Threading is controlled by the threading Cargo feature:
[dependencies]
oxifft = { version = "0.1", features = ["threading"] }When disabled, all parallel operations fall back to single-threaded execution with minimal overhead.
§Thread Pool Implementations
| Pool | Threads | Feature | Use Case |
|---|---|---|---|
SerialPool | 1 | Always | Debug, deterministic execution |
RayonPool | N | threading | Production, maximum throughput |
§Configuration
§Using Default Pool
The simplest approach uses get_default_pool():
use oxifft::threading::{ThreadPool, get_default_pool};
let pool = get_default_pool();
println!("Using {} threads", pool.num_threads());§Explicit Thread Count
Use pool_with_threads() or PoolConfig for explicit control:
use oxifft::threading::{PoolConfig, pool_with_threads};
// Method 1: Direct function
let pool = pool_with_threads(4);
// Method 2: Builder pattern
let pool = PoolConfig::new()
.threads(8)
.build();§Thread Count Guidelines
| Scenario | Recommended Threads |
|---|---|
| Large 1D FFT (>1M points) | CPU cores |
| Batch FFT | CPU cores |
| 2D/3D FFT | CPU cores |
| Small FFT (<4K points) | 1 (overhead dominates) |
| Memory-bound workloads | CPU cores / 2 |
A value of 0 in PoolConfig::threads() uses the system default (typically
the number of logical CPU cores).
§Parallelization Strategies
OxiFFT parallelizes FFTs using two strategies:
§1. Batch Parallelism
For batch transforms, each FFT in the batch can run independently:
// 1000 independent 1024-point FFTs
pool.parallel_for(1000, |batch_idx| {
compute_single_fft(batch_idx);
});§2. Dimensional Parallelism
For multi-dimensional FFTs, rows/columns can be processed in parallel:
// 2D FFT: parallelize over rows
pool.parallel_for(height, |row| {
fft_1d(&mut data[row * width..(row + 1) * width]);
});
// Then parallelize over columns
pool.parallel_for(width, |col| {
fft_1d_strided(&mut data, col, height);
});§Thread Pool Methods
The ThreadPool trait provides several parallel primitives:
| Method | Description | Use Case |
|---|---|---|
parallel_for | Execute over 0..count | Batch processing |
parallel_for_chunks | Chunked iteration | Cache-friendly access |
parallel_split | Recursive divide-and-conquer | Tree algorithms |
join | Fork-join two tasks | Divide and conquer |
§Performance Considerations
§Overhead
Threading introduces overhead from:
- Thread synchronization barriers
- Work stealing (with Rayon)
- Cache coherency traffic
For small FFTs (<4K points), this overhead can exceed the parallel speedup. OxiFFT automatically falls back to serial execution for small transforms.
§Scaling
Expected parallel efficiency:
- Batch FFT: Near-linear scaling up to 8-16 cores
- Large 1D FFT: 2-4x speedup with 8 cores (memory-bound)
- 2D/3D FFT: 4-8x speedup with 8 cores
§Memory Bandwidth
FFTs are often memory-bandwidth limited. On systems with limited memory bandwidth per core, using fewer threads may actually improve performance.
§Example: Parallel Batch FFT
use oxifft::threading::{ThreadPool, get_default_pool};
use oxifft::{Complex, fft};
let pool = get_default_pool();
let batch_size = 1000;
let fft_size = 1024;
// Allocate batch data
let mut batches: Vec<Vec<Complex<f64>>> = (0..batch_size)
.map(|_| vec![Complex::new(0.0, 0.0); fft_size])
.collect();
// Process in parallel
pool.parallel_for(batch_size, |i| {
fft(&mut batches[i]);
});§Thread Safety
All OxiFFT types implement Send and Sync where appropriate:
- Plans can be shared across threads (read-only execution)
- Input/output buffers must have exclusive access per FFT
- The global wisdom cache uses interior locking
Structs§
- Pool
Config - Configuration for creating thread pools.
- Rayon
Pool - Rayon-based thread pool.
- Serial
Pool - Single-threaded “pool” for serial execution.
Traits§
- Thread
Pool - Thread pool trait for parallel execution.
Functions§
- get_
default_ pool - Get the default thread pool based on available features.
- pool_
with_ threads - Get a thread pool with the specified number of threads.