Expand description
This is an implementation of Disjoint Set structure, also known as Union-Find.
It is used to make effecient union of subsets in larger set. If we enumerate elements of larger set
from 0 to N, we can unite 2 of them using operation union
.
We can get identifier of subset for any element using get_root
.
It is called get_root
because this structure internally creates a tree with elements as nodes.
Also, it is possible to check if 2 elements in same subset using is_united
.
This data structure uses union-by-rank and path halving to ensure asymptotically fast operations.
There are a lot of implementations of this data structure. Reasons to use this one:
- It uses single memory allocation to store all tree and all ranks.
- Unique feature: unions operations in parallel threads.
- Unique feature: After initial union operations, it is possible to query roots in parallel.
- Optimization from knowledge that all tags are valid indexes for underlying buffer. Other implementations often generate bounds checks which is slow.
- Has 3 flavors suitable for your specific needs:
- recommended default
DisjointSet
is suitable for cases when you know amount of nodes ahead of time, DisjointSetArrayU8
/DisjointSetArrayU16
/DisjointSetArrayU32
/DisjointSetArrayU64
, for cases when it is preferable to store all datastructure inline without extra heap allocation.DisjointSetDynamic
if exact number of nodes isn’t known beforehand.
- recommended default
DisjointSet
andDisjointSetDynamic
use smaller node tag type to reduce memory usage. It is significant, for example,DisjointSet
withu16::MAX
nodes use 256 Kbyte memory, which fits to L2 caches pretty well, whileu16::MAX+1
would use approximately 512 Kbytes, which is much harder to fit. This optimization is opaque to user so there is no need to think about it.- Good documentation.
- Good test coverage.
§Examples
§Kruskal’s algorithm
This algorithms finds minimal spanning tree/forest by adding cheapest edges until all nodes connected. Description
use aph_disjoint_set::{DisjointSet, UnionResult};
// (node0, node1, cost)
let mut edges = [
(0, 1, 4),
(0, 2, 0),
(1, 3, 1),
(1, 4, 1),
(2, 3, 5),
(2, 4, 1),
(2, 5, 0),
(3, 5, 0),
(4, 6, 0),
(5, 6, 2),
];
edges.sort_by_key(|(_, _, cost)|*cost);
let mut min_spanning_tree = Vec::new();
let mut djs = DisjointSet::new(7);
for (node0, node1, _) in edges {
if matches!(djs.union(node0, node1), UnionResult::Success) {
min_spanning_tree.push((node0, node1));
}
}
assert_eq!(
&min_spanning_tree,
&[(0, 2), (2, 5), (3, 5), (4, 6), (1, 3), (1, 4)]
);
§Running union operations in parallel
This code solves [Number of Islands] problem from Leetcode.
We need to count number of islands formed by 1
s in sea formed by 0
s.
Note that we need to connect nodes to their left and upper neighbors
because DisjointSet
manages subsets.
use aph_disjoint_set::DisjointSet;
use std::collections::HashSet;
let sea = vec![
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 1, 1, 0],
[0, 1, 1, 1, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 1, 0],
];
let rows = sea.len();
let cols = sea[0].len();
let total_cells = rows * cols;
let mut djs = DisjointSet::new(total_cells);
let split_row = rows / 2;
let idx = |r, c| r * cols + c;
std::thread::scope(|scope| {
let sea = &sea[..];
// Split into 2 separate parts.
// Can split each parts as much as you like.
let (mut first_half, mut second_half) = djs.split_at(total_cells / 2);
scope.spawn(move || {
// We cannot connect to upper cells in first row
for col in 1..cols {
// Union 2 connected land cells.
if sea[0][col - 1] == 1 && sea[0][col] == 1 {
first_half.union(idx(0, col - 1), idx(0, col));
}
}
for row in 1..split_row {
for col in 0..cols {
// Connect to upper cell.
if sea[row - 1][col] == 1 && sea[row][col] == 1 {
first_half.union(idx(row - 1, col), idx(row, col));
}
// Connect to left cell.
if col > 0 && sea[row][col - 1] == 1 && sea[row][col] == 1 {
first_half.union(idx(row, col - 1), idx(row, col));
}
}
}
// Optional step. Can be useful
// because it would speed-up later accesses.
first_half.compress_paths()
});
scope.spawn(move || {
// We cannot connect to upper cells because they are being updated in another thread.
// If we try, code will panic.
for col in 1..cols {
// Union 2 connected land cells.
if sea[split_row][col - 1] == 1 && sea[split_row][col] == 1 {
second_half.union(idx(split_row, col - 1), idx(split_row, col));
}
}
for row in split_row + 1..rows {
for col in 0..cols {
// Connect to upper cell.
if sea[row - 1][col] == 1 && sea[row][col] == 1 {
second_half.union(idx(row - 1, col), idx(row, col));
}
// Connect to left cell.
if col > 0 && sea[row][col - 1] == 1 && sea[row][col] == 1 {
second_half.union(idx(row, col - 1), idx(row, col));
}
}
}
});
});
// After running Union-Find in separate parts of water,
// we need only to add connections at boundary.
for col in 0..cols {
if sea[split_row - 1][col] == 1 && sea[split_row][col] == 1 {
djs.union(idx(split_row - 1, col), idx(split_row, col));
}
}
let unique_land_roots: HashSet<_> = sea
.iter()
.flat_map(|row|row.iter())
.copied()
.enumerate()
.filter(|(_, val)| 1 == *val) // Filter away water cells.
.map(|(i, _)| djs.get_root(i))
.collect();
let total_islands = unique_land_roots.len();
assert_eq!(total_islands, 4);
Structs§
- Chunk
Mut - Mutable view to part of bigger disjoint set.
Useful for running union operations using multiple threads.
Contains nodes in range
lower_bound()..upper_bound()
. - Disjoint
Set - Disjoint Set with fixed size and integer tags. Guarantees to use single allocation.
- Disjoint
SetArray U8 - Disjoint
SetArray U16 - Disjoint
SetArray U32 - Disjoint
SetArray U64 - Disjoint
SetDynamic - Disjoint Set which allows dynamic addition of new items.
If you can estimate upper limit of number of elements in disjoint set,
it is probably better to use
DisjointSet
because it doesn’t have complexity for dynamic size changing. - Disjoint
SetRo - Disjoint
SetRo View - Readonly view to the content of
DisjointSet
. Useful for quering same disjoint set from many threads using rayon orstd::thread::scope
. SeeDisjointSet::make_ro_view
for examples - RoDisjoint
SetArray U8 - RoDisjoint
SetArray U16 - RoDisjoint
SetArray U32 - RoDisjoint
SetArray U64 - Root