pub fn parallel_scan<T, F>(
input: &[T],
identity: T,
op: F,
mode: ScanMode,
) -> Result<Vec<T>>Expand description
Performs a parallel prefix scan using the Blelloch algorithm.
The Blelloch parallel prefix scan (Blelloch, 1990) is a work-efficient algorithm that computes all prefix reductions of an array using a binary associative operation. It operates in two phases that mirror the up-sweep/down-sweep structure of the original algorithm, adapted for practical multi-core CPU execution:
§Algorithm
§Phase 1: Up-Sweep (Reduce)
The input array is divided into p chunks (where p is bounded by available
parallelism). Each chunk computes a local inclusive prefix scan independently
and in parallel. The last element of each scanned chunk becomes the chunk total,
representing the partial reduction for that segment. This corresponds to the
bottom-up reduce phase of the Blelloch binary tree, where leaf-level partial
sums propagate upward.
§Phase 2: Down-Sweep (Distribute)
An exclusive prefix scan of the chunk totals produces global prefixes for each chunk. Each global prefix represents the cumulative reduction of all preceding chunks. These prefixes are then distributed back to every element within each chunk in parallel, adjusting local scan values into globally correct prefix values. This corresponds to the top-down distribute phase of the Blelloch tree.
§Complexity
- Work: O(n) total operations across all threads — each element is touched exactly twice (once during local scan, once during prefix adjustment).
- Depth: O(n/p + p) for
pparallel threads, where n/p is the local scan depth and p is the sequential chunk-total scan depth. - Space: O(n) for the output array + O(p) for chunk metadata.
§Sequential Fallback
For arrays smaller than 8192 elements, a sequential scan is used automatically because the thread synchronization overhead would exceed the computation savings.
§Type Parameters
T- A numeric type satisfyingnum_traits::Num + Copy + Send + SyncF- A binary associative operation closure
§Arguments
input- The input slice to scanidentity- The identity element forop(must satisfyop(identity, x) == xandop(x, identity) == xfor allx)op- A binary associative operation (must satisfyop(op(a, b), c) == op(a, op(b, c)))mode- Whether to compute an inclusive or exclusive scan
§Returns
A Vec<T> containing the scan result, or an error if parallel execution fails.
§Examples
use numrs2::parallel::parallel_algorithms::{parallel_scan, ScanMode};
// Inclusive prefix sum: [1, 3, 6, 10, 15]
let result = parallel_scan(&[1, 2, 3, 4, 5], 0, |a, b| a + b, ScanMode::Inclusive)
.expect("scan should succeed");
assert_eq!(result, vec![1, 3, 6, 10, 15]);
// Exclusive prefix sum: [0, 1, 3, 6, 10]
let result = parallel_scan(&[1, 2, 3, 4, 5], 0, |a, b| a + b, ScanMode::Exclusive)
.expect("scan should succeed");
assert_eq!(result, vec![0, 1, 3, 6, 10]);
// Inclusive prefix product: [1, 2, 6, 24, 120]
let result = parallel_scan(&[1, 2, 3, 4, 5], 1, |a, b| a * b, ScanMode::Inclusive)
.expect("scan should succeed");
assert_eq!(result, vec![1, 2, 6, 24, 120]);