Crate aph_disjoint_set

Source
Expand description

This is an implementation of Disjoint Set structure, also known as Union-Find.

It is used to make effecient union of subsets in larger set. If we enumerate elements of larger set from 0 to N, we can unite 2 of them using operation union. We can get identifier of subset for any element using get_root. It is called get_root because this structure internally creates a tree with elements as nodes. Also, it is possible to check if 2 elements in same subset using is_united. This data structure uses union-by-rank and path halving to ensure asymptotically fast operations.

There are a lot of implementations of this data structure. Reasons to use this one:

  • It uses single memory allocation to store all tree and all ranks.
  • Unique feature: unions operations in parallel threads.
  • Unique feature: After initial union operations, it is possible to query roots in parallel.
  • Optimization from knowledge that all tags are valid indexes for underlying buffer. Other implementations often generate bounds checks which is slow.
  • Has 3 flavors suitable for your specific needs:
  • DisjointSet and DisjointSetDynamic use smaller node tag type to reduce memory usage. It is significant, for example, DisjointSet with u16::MAX nodes use 256 Kbyte memory, which fits to L2 caches pretty well, while u16::MAX+1 would use approximately 512 Kbytes, which is much harder to fit. This optimization is opaque to user so there is no need to think about it.
  • Good documentation.
  • Good test coverage.

§Examples

§Kruskal’s algorithm

This algorithms finds minimal spanning tree/forest by adding cheapest edges until all nodes connected. Description

use aph_disjoint_set::{DisjointSet, UnionResult};
// (node0, node1, cost)
let mut edges = [
     (0, 1, 4),
     (0, 2, 0),
     (1, 3, 1),
     (1, 4, 1),
     (2, 3, 5),
     (2, 4, 1),
     (2, 5, 0),
     (3, 5, 0),
     (4, 6, 0),
     (5, 6, 2),
];
edges.sort_by_key(|(_, _, cost)|*cost);
let mut min_spanning_tree = Vec::new();
let mut djs = DisjointSet::new(7);
for (node0, node1, _) in edges {
    if matches!(djs.union(node0, node1), UnionResult::Success) {
        min_spanning_tree.push((node0, node1));
    }
}
assert_eq!(
    &min_spanning_tree,
    &[(0, 2), (2, 5), (3, 5), (4, 6), (1, 3), (1, 4)]
);

§Running union operations in parallel

This code solves [Number of Islands] problem from Leetcode. We need to count number of islands formed by 1s in sea formed by 0s.

Note that we need to connect nodes to their left and upper neighbors because DisjointSet manages subsets.

use aph_disjoint_set::DisjointSet;
use std::collections::HashSet;

let sea = vec![
    [0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 1, 0, 0, 1, 1, 0],
    [0, 1, 1, 1, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0, 1, 1, 0, 0],
    [0, 1, 0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 1, 0, 0, 1, 1, 0],
];

let rows = sea.len();
let cols = sea[0].len();
let total_cells = rows * cols;
let mut djs = DisjointSet::new(total_cells);
let split_row = rows / 2;

let idx = |r, c| r * cols + c;

std::thread::scope(|scope| {
    let sea = &sea[..];
    // Split into 2 separate parts.
    // Can split each parts as much as you like.
    let (mut first_half, mut second_half) = djs.split_at(total_cells / 2);
    scope.spawn(move || {
        // We cannot connect to upper cells in first row
        for col in 1..cols {
            // Union 2 connected land cells.
            if sea[0][col - 1] == 1 && sea[0][col] == 1 {
                first_half.union(idx(0, col - 1), idx(0, col));
            }
        }
        for row in 1..split_row {
            for col in 0..cols {
                // Connect to upper cell.
                if sea[row - 1][col] == 1 && sea[row][col] == 1 {
                    first_half.union(idx(row - 1, col), idx(row, col));
                }
                // Connect to left cell.
                if col > 0 && sea[row][col - 1] == 1 && sea[row][col] == 1 {
                    first_half.union(idx(row, col - 1), idx(row, col));
                }
            }
        }
        // Optional step. Can be useful
        // because it would speed-up later accesses.
        first_half.compress_paths()
    });
    scope.spawn(move || {
        // We cannot connect to upper cells because they are being updated in another thread.
        // If we try, code will panic.
        for col in 1..cols {
            // Union 2 connected land cells.
            if sea[split_row][col - 1] == 1 && sea[split_row][col] == 1 {
                second_half.union(idx(split_row, col - 1), idx(split_row, col));
            }
        }
        for row in split_row + 1..rows {
            for col in 0..cols {
                // Connect to upper cell.
                if sea[row - 1][col] == 1 && sea[row][col] == 1 {
                    second_half.union(idx(row - 1, col), idx(row, col));
                }
                // Connect to left cell.
                if col > 0 && sea[row][col - 1] == 1 && sea[row][col] == 1 {
                    second_half.union(idx(row, col - 1), idx(row, col));
                }
            }
        }
    });
});
// After running Union-Find in separate parts of water,
// we need only to add connections at boundary.
for col in 0..cols {
    if sea[split_row - 1][col] == 1 && sea[split_row][col] == 1 {
        djs.union(idx(split_row - 1, col), idx(split_row, col));
    }
}

let unique_land_roots: HashSet<_> = sea
    .iter()
    .flat_map(|row|row.iter())
    .copied()
    .enumerate()
    .filter(|(_, val)| 1 == *val) // Filter away water cells.
    .map(|(i, _)| djs.get_root(i))
    .collect();
let total_islands = unique_land_roots.len();
assert_eq!(total_islands, 4);

Structs§

ChunkMut
Mutable view to part of bigger disjoint set. Useful for running union operations using multiple threads. Contains nodes in range lower_bound()..upper_bound().
DisjointSet
Disjoint Set with fixed size and integer tags. Guarantees to use single allocation.
DisjointSetArrayU8
DisjointSetArrayU16
DisjointSetArrayU32
DisjointSetArrayU64
DisjointSetDynamic
Disjoint Set which allows dynamic addition of new items. If you can estimate upper limit of number of elements in disjoint set, it is probably better to use DisjointSet because it doesn’t have complexity for dynamic size changing.
DisjointSetRo
DisjointSetRoView
Readonly view to the content of DisjointSet. Useful for quering same disjoint set from many threads using rayon or std::thread::scope. See DisjointSet::make_ro_view for examples
RoDisjointSetArrayU8
RoDisjointSetArrayU16
RoDisjointSetArrayU32
RoDisjointSetArrayU64
Root

Enums§

UnionResult