Skip to main content

Module threading

Module threading 

Source
Expand description

Parallel execution support for multi-threaded FFT computation.

Provides a threading abstraction layer with optional Rayon integration for parallelizing FFT operations across multiple CPU cores.

§Overview

OxiFFT’s threading system provides:

  • Pluggable thread pools via the ThreadPool trait
  • Automatic parallelization of batch and multi-dimensional FFTs
  • Zero-cost abstraction when threading is disabled

§Feature Flag

Threading is controlled by the threading Cargo feature:

[dependencies]
oxifft = { version = "0.1", features = ["threading"] }

When disabled, all parallel operations fall back to single-threaded execution with minimal overhead.

§Thread Pool Implementations

PoolThreadsFeatureUse Case
SerialPool1AlwaysDebug, deterministic execution
RayonPoolNthreadingProduction, maximum throughput

§Configuration

§Using Default Pool

The simplest approach uses get_default_pool():

use oxifft::threading::{ThreadPool, get_default_pool};

let pool = get_default_pool();
println!("Using {} threads", pool.num_threads());

§Explicit Thread Count

Use pool_with_threads() or PoolConfig for explicit control:

use oxifft::threading::{PoolConfig, pool_with_threads};

// Method 1: Direct function
let pool = pool_with_threads(4);

// Method 2: Builder pattern
let pool = PoolConfig::new()
    .threads(8)
    .build();

§Thread Count Guidelines

ScenarioRecommended Threads
Large 1D FFT (>1M points)CPU cores
Batch FFTCPU cores
2D/3D FFTCPU cores
Small FFT (<4K points)1 (overhead dominates)
Memory-bound workloadsCPU cores / 2

A value of 0 in PoolConfig::threads() uses the system default (typically the number of logical CPU cores).

§Parallelization Strategies

OxiFFT parallelizes FFTs using two strategies:

§1. Batch Parallelism

For batch transforms, each FFT in the batch can run independently:

// 1000 independent 1024-point FFTs
pool.parallel_for(1000, |batch_idx| {
    compute_single_fft(batch_idx);
});

§2. Dimensional Parallelism

For multi-dimensional FFTs, rows/columns can be processed in parallel:

// 2D FFT: parallelize over rows
pool.parallel_for(height, |row| {
    fft_1d(&mut data[row * width..(row + 1) * width]);
});
// Then parallelize over columns
pool.parallel_for(width, |col| {
    fft_1d_strided(&mut data, col, height);
});

§Thread Pool Methods

The ThreadPool trait provides several parallel primitives:

MethodDescriptionUse Case
parallel_forExecute over 0..countBatch processing
parallel_for_chunksChunked iterationCache-friendly access
parallel_splitRecursive divide-and-conquerTree algorithms
joinFork-join two tasksDivide and conquer

§Performance Considerations

§Overhead

Threading introduces overhead from:

  • Thread synchronization barriers
  • Work stealing (with Rayon)
  • Cache coherency traffic

For small FFTs (<4K points), this overhead can exceed the parallel speedup. OxiFFT automatically falls back to serial execution for small transforms.

§Scaling

Expected parallel efficiency:

  • Batch FFT: Near-linear scaling up to 8-16 cores
  • Large 1D FFT: 2-4x speedup with 8 cores (memory-bound)
  • 2D/3D FFT: 4-8x speedup with 8 cores

§Memory Bandwidth

FFTs are often memory-bandwidth limited. On systems with limited memory bandwidth per core, using fewer threads may actually improve performance.

§Example: Parallel Batch FFT

use oxifft::threading::{ThreadPool, get_default_pool};
use oxifft::{Complex, fft};

let pool = get_default_pool();
let batch_size = 1000;
let fft_size = 1024;

// Allocate batch data
let mut batches: Vec<Vec<Complex<f64>>> = (0..batch_size)
    .map(|_| vec![Complex::new(0.0, 0.0); fft_size])
    .collect();

// Process in parallel
pool.parallel_for(batch_size, |i| {
    fft(&mut batches[i]);
});

§Thread Safety

All OxiFFT types implement Send and Sync where appropriate:

  • Plans can be shared across threads (read-only execution)
  • Input/output buffers must have exclusive access per FFT
  • The global wisdom cache uses interior locking

Structs§

PoolConfig
Configuration for creating thread pools.
RayonPool
Rayon-based thread pool.
SerialPool
Single-threaded “pool” for serial execution.

Traits§

ThreadPool
Thread pool trait for parallel execution.

Functions§

get_default_pool
Get the default thread pool based on available features.
pool_with_threads
Get a thread pool with the specified number of threads.