Skip to main content

parallel_scan

Function parallel_scan 

Source
pub fn parallel_scan<T, F>(
    input: &[T],
    identity: T,
    op: F,
    mode: ScanMode,
) -> Result<Vec<T>>
where T: Num + Copy + Send + Sync, F: Fn(T, T) -> T + Send + Sync + Copy,
Expand description

Performs a parallel prefix scan using the Blelloch algorithm.

The Blelloch parallel prefix scan (Blelloch, 1990) is a work-efficient algorithm that computes all prefix reductions of an array using a binary associative operation. It operates in two phases that mirror the up-sweep/down-sweep structure of the original algorithm, adapted for practical multi-core CPU execution:

§Algorithm

§Phase 1: Up-Sweep (Reduce)

The input array is divided into p chunks (where p is bounded by available parallelism). Each chunk computes a local inclusive prefix scan independently and in parallel. The last element of each scanned chunk becomes the chunk total, representing the partial reduction for that segment. This corresponds to the bottom-up reduce phase of the Blelloch binary tree, where leaf-level partial sums propagate upward.

§Phase 2: Down-Sweep (Distribute)

An exclusive prefix scan of the chunk totals produces global prefixes for each chunk. Each global prefix represents the cumulative reduction of all preceding chunks. These prefixes are then distributed back to every element within each chunk in parallel, adjusting local scan values into globally correct prefix values. This corresponds to the top-down distribute phase of the Blelloch tree.

§Complexity

  • Work: O(n) total operations across all threads — each element is touched exactly twice (once during local scan, once during prefix adjustment).
  • Depth: O(n/p + p) for p parallel threads, where n/p is the local scan depth and p is the sequential chunk-total scan depth.
  • Space: O(n) for the output array + O(p) for chunk metadata.

§Sequential Fallback

For arrays smaller than 8192 elements, a sequential scan is used automatically because the thread synchronization overhead would exceed the computation savings.

§Type Parameters

  • T - A numeric type satisfying num_traits::Num + Copy + Send + Sync
  • F - A binary associative operation closure

§Arguments

  • input - The input slice to scan
  • identity - The identity element for op (must satisfy op(identity, x) == x and op(x, identity) == x for all x)
  • op - A binary associative operation (must satisfy op(op(a, b), c) == op(a, op(b, c)))
  • mode - Whether to compute an inclusive or exclusive scan

§Returns

A Vec<T> containing the scan result, or an error if parallel execution fails.

§Examples

use numrs2::parallel::parallel_algorithms::{parallel_scan, ScanMode};

// Inclusive prefix sum: [1, 3, 6, 10, 15]
let result = parallel_scan(&[1, 2, 3, 4, 5], 0, |a, b| a + b, ScanMode::Inclusive)
    .expect("scan should succeed");
assert_eq!(result, vec![1, 3, 6, 10, 15]);

// Exclusive prefix sum: [0, 1, 3, 6, 10]
let result = parallel_scan(&[1, 2, 3, 4, 5], 0, |a, b| a + b, ScanMode::Exclusive)
    .expect("scan should succeed");
assert_eq!(result, vec![0, 1, 3, 6, 10]);

// Inclusive prefix product: [1, 2, 6, 24, 120]
let result = parallel_scan(&[1, 2, 3, 4, 5], 1, |a, b| a * b, ScanMode::Inclusive)
    .expect("scan should succeed");
assert_eq!(result, vec![1, 2, 6, 24, 120]);