Expand description
§incremental-tree-set-union
A no_std Rust implementation of the Gabow-Tarjan linear-time algorithm for incremental tree set union (JCSS 1985).
Do not use this crate for performance. This is a faithful implementation of a theoretically important algorithm that is 6-10x slower than standard union-find (such as
disjoint-setsorunion-find) for all practical input sizes. The algorithm eliminates an inverse-Ackermann factor that is at most 4 for any input that could fit in memory. The overhead of the microset/answer-table machinery far exceeds this saving. Use standard union-find unless you need the strict O(m + n) bound for a complexity proof.
§Overview
This crate implements the algorithm from Gabow and Tarjan’s 1985 paper “A Linear-Time Algorithm for a Special Case of Disjoint Set Union”. It solves a specialized version of the disjoint set union problem where the union structure follows a rooted tree. The static variant achieves O(m + n) total time for m intermixed operations on n nodes, strictly linear, with no inverse-Ackermann factor. The incremental variant achieves O(m + n log log n) due to an encoding tradeoff (see Complexity deviation).
The tree is partitioned into small subtrees called microsets (each fewer than b nodes, where b = Theta(log log n)). A precomputed answer table resolves queries within each microset in O(1). Standard union-find macrosets handle queries that cross microset boundaries.
Two variants are provided:
StaticTreeSetUnion: the tree is known in advance (Section 2 of the paper). Achieves O(m + n) total time.IncrementalTreeSetUnion: the tree grows dynamically viagrowoperations (Section 3 of the paper). See Complexity deviation for a discussion of the time bound.
§Semantics
- Initially, every node is its own set name (marked).
link(v)removes v as a set name, merging v into its parent’s set. After this,findwill skip past v. Equivalent tounite(parent(v), v)in standard union-find.find(v)returns the nearest ancestor of v that is still a set name (has not been linked). A node is its own ancestor, sofind(v) = vif v has not been linked.grow(v, w)(incremental variant only) adds node w as a child of existing node v.link(root)panics: the root has no parent and always remains a set name.
§API usage
§Static variant (tree known in advance)
use incremental_tree_set_union::StaticTreeSetUnion;
// Tree: 0 -> {1, 2}, 1 -> {3, 4}, 2 -> 5
// parents[i] = parent of node i; parents[root] = root
let parents = vec![0, 0, 0, 1, 1, 2];
let mut tsu = StaticTreeSetUnion::from_parents(&parents);
// Initially, find(v) = v for all v (all nodes are set names)
assert_eq!(tsu.find(4), 4);
assert_eq!(tsu.find(3), 3);
// Link node 3: it is no longer a set name
tsu.link(3);
assert_eq!(tsu.find(3), 1); // skips past 3, reaches parent 1
// Link node 1
tsu.link(1);
assert_eq!(tsu.find(3), 0); // skips past 3 and 1, reaches root 0
assert_eq!(tsu.find(4), 4); // 4 was never linked, still a set name§Incremental variant (tree grows dynamically)
use incremental_tree_set_union::IncrementalTreeSetUnion;
let mut itsu = IncrementalTreeSetUnion::new(100); // capacity hint
// Build tree dynamically
itsu.grow(0, 1); // add node 1 as child of root 0
itsu.grow(0, 2); // add node 2 as child of root 0
itsu.grow(1, 3); // add node 3 as child of node 1
// Operations work the same as the static variant
itsu.link(3);
assert_eq!(itsu.find(3), 1);
itsu.link(1);
assert_eq!(itsu.find(3), 0);§How it works
The algorithm has three layers:
§Layer 1: Microsets
The tree T is partitioned into small subtrees called microsets, each containing fewer than b - 1 nodes. Each microset has an external root (a node outside the microset that is the parent of the microset’s topmost nodes in T). The tree structure within each microset is encoded as a forest bitmask using unary child counts: for each node in preorder, emit one 1 bit followed by num_children 0 bits.
§Layer 2: Answer table
A precomputed lookup table answers “nearest marked ancestor” queries within a single microset in O(1). The table is indexed by (forest_encoding << (b-1)) | mark_bitmask and contains packed answers for all node positions, each ceil(log2(b)) bits wide. The table has 2^(3b-4) entries and is constructed in O(n) time.
§Layer 3: Macrosets
Standard union-find (path compression + union by size) on the microset roots handles queries that cross microset boundaries. When microfind returns the external root of a microset (no marked ancestor found within), the algorithm transitions to the macroset layer to find the next microset up the tree.
§The find algorithm
microfind(v):
look up answer_table[(forest, mark)] for node v's position
if answer = 0: return microset root (cross to macroset)
else: return the ancestor node within the microset
find(v):
y = microfind(v)
if y is in a different microset than v:
x = macro_top(macrofind(y)) // find highest node in macroset
loop:
y = microfind(x)
if y is in the same microset as x: break
macrounite(y, x) // merge macrosets
x = macro_top(y) // continue from highest
return y§The parameter b
The microset size parameter b is chosen adaptively based on n to balance the answer table size against the number of macroset operations:
| n | b | max nodes/microset | answer table entries |
|---|---|---|---|
| < 64 | 2 | 1 | 4 |
| 64 - 767 | 3 | 2 | 32 |
| 768 - 16,383 | 4 | 3 | 256 |
| 16,384 - 524,287 | 5 | 4 | 2,048 |
| 524,288 - ~25M | 6 | 5 | 16,384 |
§Performance
§Summary
The static variant is 6-10x slower than standard union-find. The incremental variant is 50-120x slower. The algorithm’s value is theoretical: it proves that tree set union can be solved in strictly O(m + n) time. The constant factor makes it impractical as a faster alternative to standard union-find.
The root cause is fundamental to the algorithm: each microfind requires 3-4 dependent memory loads (NodeInfo, Microset, answer table, node array), versus 1 load for a path-compressed standard union-find find.
The factor this algorithm eliminates, the inverse Ackermann function alpha(m, n), grows so slowly that it is effectively constant. The Ackermann function grows as A(1, n) = 2n, A(2, n) ~ 2^n, A(3, n) ~ 2^2^…^2 (n times), A(4, n) ~ a tower of 2s of height 2^2^…^2. Its inverse alpha(m, n) is at most k when n <= A(k, floor(m/n)). Since A(4, 1) = 65533, we need n > 65533 for alpha to even potentially reach 5, and that only happens when m/n is very small. For all practical purposes, alpha(m, n) <= 4. For n <= 65536, alpha <= 3. In practice, the “improvement” from O(m * alpha) to O(m) saves a constant factor of at most 3-4 on the query portion of the work, while the microset/answer-table machinery adds a 5-8x overhead per query.
§Static variant vs standard union-find
Benchmarked against disjoint-sets and union-find crates on random trees (construct + n/2 links + n finds). Numbers are approximate and vary across runs and hardware:
| n | StaticTreeSetUnion | disjoint-sets | union-find | slowdown |
|---|---|---|---|---|
| 1,000 | 32 us | 3.5 us | 3.6 us | 9x |
| 10,000 | 568 us | 59 us | 66 us | 10x |
| 100,000 | 7.8 ms | 1.25 ms | 1.44 ms | 6x |
| 1,000,000 | 225 ms | 25 ms | 32 ms | 9x |
§Static variant, query-only (construction excluded)
| n | StaticTreeSetUnion | disjoint-sets | union-find | slowdown |
|---|---|---|---|---|
| 1,000 | 5.0 us | 0.79 us | 0.72 us | 7x |
| 10,000 | 79 us | 9.2 us | 9.2 us | 8.6x |
| 100,000 | 2.0 ms | 414 us | 417 us | 4.8x |
| 1,000,000 | 64 ms | 6.6 ms | 6.3 ms | 10x |
Even with construction excluded, the per-find overhead from dependent cache misses keeps the algorithm 5-10x slower. The gap narrows at 100K (where data structures partially fit in L3 cache) but widens again at 1M when they spill to DRAM and the 3:1 load ratio per find fully manifests.
§Incremental variant vs standard union-find
The incremental variant is substantially slower because every grow call does an O(b) preorder rebuild of the microset (see Complexity deviation):
| n | IncrementalTreeSetUnion | disjoint-sets | union-find | slowdown |
|---|---|---|---|---|
| 1,000 | 376 us | 3.1 us | 3.9 us | 120x |
| 10,000 | 4.7 ms | 48 us | 74 us | 98x |
| 100,000 | 58 ms | 1.19 ms | 1.45 ms | 49x |
§Complexity deviation in the incremental variant
The paper’s Section 3 states that grow is O(1) amortized (without a split) and O(b) when a split occurs, giving O(m + n) total time. Our implementation does O(b) for every grow because the forest encoding requires a full preorder rebuild when a child is added to an interior node.
The paper uses a parent-table encoding (parent(i, j) = k) that can be extended in O(1) by setting one entry. We use the forest encoding (unary child-count bitmask, following the janezb C++ implementation) instead. Adding a child changes a node’s child count, which shifts all subsequent bits in the bitmask and requires a complete rebuild.
We chose the forest encoding because the parent table makes the answer table dramatically larger. The parent table requires (b-1) * ceil(log2(b)) bits per microset structure, versus 2b - 3 bits for the forest encoding. This means the answer table indexed by (structure, mark) grows from 2^(3b-4) entries to 2^((b-1)(ceil(log2(b))+1)) entries:
| Encoding | Grow cost | Table size (b=5) | Table size (b=7) |
|---|---|---|---|
| Parent table (paper) | O(1) amortized | 65,536 entries | 16,777,216 entries |
| Forest bitmask (this impl) | O(b) per grow | 2,048 entries | 131,072 entries |
| Ratio | 32x larger | 128x larger |
At b=7 (n around 25M nodes), the parent-table answer table would consume 128 MB of memory (16M entries * 8 bytes). The forest encoding needs only 1 MB. Both are still O(n) asymptotically, but the constant factor makes the parent table impractical for large inputs. The paper itself acknowledges this tradeoff on page 216: it presents the forest encoding as a way to “choose a larger value of b” with a more compact representation.
Our total time for the incremental variant is O(m + n log log n) instead of the paper’s O(m + n), where the extra log log n factor comes from b = Theta(log log n). The static variant is unaffected and achieves the paper’s O(m + n) bound.
§Paper bugs discovered
Two bugs in the published pseudocode of GT85 were independently confirmed by both existing reference implementations:
§Bug 1: MacroFind vs MicroFind in find (p.217, line 7)
The paper’s line 7 says macrounite(macrofind(x), x). The correct version (from the earlier GT83 technical report, p.248) is macrounite(microfind(x), x). Using MacroFind can cause an infinite loop.
- janezb (C++): “Note: in line 7 of ‘find’, GT85 (p. 217) has MacroFind, while GT83 (p. 248) has MicroFind; the latter is correct, while the former can cause an endless loop.”
- bzliu94 (Python): “there is a bug from pseudocode here for first argument”
§Bug 2: macro_top continuation (unstated requirement)
After MacroUnite, the algorithm must continue from the node closest to the tree root in the macroset component, not from an arbitrary union-find representative. Neither GT85 nor GT83 states this requirement.
- janezb (C++): “An important detail mentioned in neither GT85 nor GT83: we need to continue from the highest (i.e. closest to root) member of the macroset.”
- bzliu94 (Python): achieves the same effect via
NamedUnionFindwhere the canonical representative is always the highest element by design.
§Implementation notes
no_stdcompatible withalloc(edition 2024, rust-version 1.85)- Forest encoding (janezb style) instead of the paper’s parent table: produces a 32-128x smaller answer table but prevents O(1) incremental grow
- Mark bit convention: bit = 0 means node is a set name (marked); bit = 1 means linked (unmarked)
- Answer table: 2^(3b-4) entries of
u64, constructed in O(n) time - Macroset: standard union-find with path compression, union by size, and explicit
macro_toptracking
§Testing
- 51 tests across 6 files (4 integration test files + 2 unit test modules + 2 doc-tests from this README): correctness properties, edge cases (single node, path, star, binary tree, adversarial grow, n=100K), golden vectors verified against the bzliu94 Python reference implementation, and randomized tests with dual oracle (standard union-find + naive walk-up)
- 3 cargo-fuzz targets (
fuzz_static,fuzz_incremental,fuzz_both) checking against a union-find oracle and naive walk-up oracle simultaneously, with structured tree generation (random, path, star, caterpillar) and phased operation modes (link-all-then-find-all) - 100% line coverage (tarpaulin LLVM)
cargo test # run all tests
./fuzz.sh # start all 3 fuzzers in tmux panels
cargo bench # run benchmarks§Applications
The paper (Section 4) lists ten applications where tree set union removes an inverse-Ackermann or iterated-logarithm factor from the previously best-known algorithm. In all cases the improvement is purely theoretical: the removed factor is at most 4 (alpha) or 5 (log*) for any practical input, and the constant-factor overhead of this implementation makes it slower than standard union-find in practice.
Applications (1)-(5) use the static variant on path-shaped trees. Applications (6)-(8) use the static variant on general trees. Application (9) uses the incremental variant. Application (10) uses an extended static variant.
- Two-processor scheduling
- Multiprocessor scheduling (priority list and interval dag)
- The off-line min problem
- Matching on convex graphs and scheduling with release times and deadlines
- VLSI channel routing
- Nearest common ancestors (off-line)
- Flow graph reducibility
- Two directed spanning trees
- Maximum cardinality matching on nonbipartite graphs
- The set-splitting problem
Application (9) is the motivation for this crate: Blum’s O(sqrt(n)(m+n)) maximum matching algorithm uses incremental tree set union to state its time bound without an alpha factor. Gabow and Tarjan also observe (p.219) that substituting tree set union into the Micali-Vazirani matching algorithm would give the O(sqrt(n)m) bound directly, avoiding a separate 50-page argument about blossom structure. Micali-Vazirani themselves did not use tree set union; they used standard union-find and proved the blossom property independently.
§References
- H. N. Gabow and R. E. Tarjan, “A linear-time algorithm for a special case of disjoint set union,” J. Comput. Syst. Sci., vol. 30, pp. 209-221, 1985.
- H. N. Gabow and R. E. Tarjan, “A linear-time algorithm for a special case of disjoint set union,” Proc. 15th Annual ACM Symposium on Theory of Computing, 1983, pp. 246-251. (Earlier version with corrected pseudocode.)
- N. Blum, “Maximum Matching in General Graphs Without Explicit Consideration of Blossoms Revisited,” full version, Uni Bonn, July 2016.
§License
MIT
Re-exports§
pub use incremental::IncrementalTreeSetUnion;pub use static_tree::StaticTreeSetUnion;