Crate dary_heap

source ·
Expand description

A priority queue implemented with a d-ary heap.

Insertion and popping the largest element have O(log(n)) time complexity. Checking the largest element is O(1). Converting a vector to a d-ary heap can be done in-place, and has O(n) complexity. A d-ary heap can also be converted to a sorted vector in-place, allowing it to be used for an O(n * log(n)) in-place heapsort.

Comparison to standard library

The standard library contains a 2-ary heap (std::collections::BinaryHeap). The BinaryHeap of this crate aims to be a drop-in replacement, both in API and in performance. Cargo features are used in place of unstable Rust features. The advantage of this crate over the standard library lies in the possibility of easily changing the arity of the heap, which can increase performance.

The standard library binary heap can contain up to isize::MAX elements; this is the same for the binary heap of this crate, but other heaps in this crate can hold less elements. In the general case, the maximum number of elements is (usize::MAX - 1) / d for an arity of d. On 64-bit systems this should generally not be a concern when using reasonable arities. On 32-bit systems this may be a concern when using very large heaps with a relatively high arity.

Comparison of different arities d

The arity d is defined as the maximum number of children each node can have. A higher number means the heap has less layers, but may require more work per layer because there are more children present. This generally makes methods adding elements to the heap such as push faster, and methods removing them such as pop slower. However, due to higher cache locality for higher d, the drop in pop performance is often diminished. If you’re unsure what value of d to choose, the QuaternaryHeap with d = 4 is usually a good start, but benchmarking is necessary to determine the best value of d.

Usage

Rust type interference cannot infer the desired heap arity (value of d) automatically when using DaryHeap directly. It is therefore more ergonomic to use one of the type aliases to select the desired arity:

The difference in ergonomics illustrated in the following:

use dary_heap::{DaryHeap, TernaryHeap};

// Type parameter T can be inferred, but arity cannot
let mut heap1 = DaryHeap::<_, 3>::new();
heap1.push(42);

// Type alias removes need for explicit type
let mut heap2 = TernaryHeap::new();
heap2.push(42);

If a different arity is desired, you can use the former or a define a type alias yourself. It should be noted that d > 8 is rarely beneficial.

Validity of arities in d-ary heaps

Only arities of two or greater are useful in d-ary heap, and are therefore the only ones implemented by default. Lower arities are only possible if you put in the effort to implement them yourself. An arity of one is possible, but yields a heap where every element has one child. This essentially makes it a sorted vector with poor performance. Regarding an arity of zero: this is not statically prevented, but constructing a DaryHeap with it and using it may (and probably will) result in a runtime panic.

Examples

This is a larger example that implements Dijkstra’s algorithm to solve the shortest path problem on a directed graph. It shows how to use DaryHeap with custom types.

use std::cmp::Ordering;
use dary_heap::TernaryHeap;

#[derive(Copy, Clone, Eq, PartialEq)]
struct State {
    cost: usize,
    position: usize,
}

// The priority queue depends on `Ord`.
// Explicitly implement the trait so the queue becomes a min-heap
// instead of a max-heap.
impl Ord for State {
    fn cmp(&self, other: &Self) -> Ordering {
        // Notice that the we flip the ordering on costs.
        // In case of a tie we compare positions - this step is necessary
        // to make implementations of `PartialEq` and `Ord` consistent.
        other.cost.cmp(&self.cost)
            .then_with(|| self.position.cmp(&other.position))
    }
}

// `PartialOrd` needs to be implemented as well.
impl PartialOrd for State {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

// Each node is represented as a `usize`, for a shorter implementation.
struct Edge {
    node: usize,
    cost: usize,
}

// Dijkstra's shortest path algorithm.

// Start at `start` and use `dist` to track the current shortest distance
// to each node. This implementation isn't memory-efficient as it may leave duplicate
// nodes in the queue. It also uses `usize::MAX` as a sentinel value,
// for a simpler implementation.
fn shortest_path(adj_list: &Vec<Vec<Edge>>, start: usize, goal: usize) -> Option<usize> {
    // dist[node] = current shortest distance from `start` to `node`
    let mut dist: Vec<_> = (0..adj_list.len()).map(|_| usize::MAX).collect();

    let mut heap = TernaryHeap::new();

    // We're at `start`, with a zero cost
    dist[start] = 0;
    heap.push(State { cost: 0, position: start });

    // Examine the frontier with lower cost nodes first (min-heap)
    while let Some(State { cost, position }) = heap.pop() {
        // Alternatively we could have continued to find all shortest paths
        if position == goal { return Some(cost); }

        // Important as we may have already found a better way
        if cost > dist[position] { continue; }

        // For each node we can reach, see if we can find a way with
        // a lower cost going through this node
        for edge in &adj_list[position] {
            let next = State { cost: cost + edge.cost, position: edge.node };

            // If so, add it to the frontier and continue
            if next.cost < dist[next.position] {
                heap.push(next);
                // Relaxation, we have now found a better way
                dist[next.position] = next.cost;
            }
        }
    }

    // Goal not reachable
    None
}

fn main() {
    // This is the directed graph we're going to use.
    // The node numbers correspond to the different states,
    // and the edge weights symbolize the cost of moving
    // from one node to another.
    // Note that the edges are one-way.
    //
    //                  7
    //          +-----------------+
    //          |                 |
    //          v   1        2    |  2
    //          0 -----> 1 -----> 3 ---> 4
    //          |        ^        ^      ^
    //          |        | 1      |      |
    //          |        |        | 3    | 1
    //          +------> 2 -------+      |
    //           10      |               |
    //                   +---------------+
    //
    // The graph is represented as an adjacency list where each index,
    // corresponding to a node value, has a list of outgoing edges.
    // Chosen for its efficiency.
    let graph = vec![
        // Node 0
        vec![Edge { node: 2, cost: 10 },
             Edge { node: 1, cost: 1 }],
        // Node 1
        vec![Edge { node: 3, cost: 2 }],
        // Node 2
        vec![Edge { node: 1, cost: 1 },
             Edge { node: 3, cost: 3 },
             Edge { node: 4, cost: 1 }],
        // Node 3
        vec![Edge { node: 0, cost: 7 },
             Edge { node: 4, cost: 2 }],
        // Node 4
        vec![]];

    assert_eq!(shortest_path(&graph, 0, 1), Some(1));
    assert_eq!(shortest_path(&graph, 0, 3), Some(3));
    assert_eq!(shortest_path(&graph, 3, 0), Some(7));
    assert_eq!(shortest_path(&graph, 0, 4), Some(5));
    assert_eq!(shortest_path(&graph, 4, 0), None);
}

Structs

  • A priority queue implemented with a d-ary heap.
  • A draining iterator over the elements of a DaryHeap.
  • A draining iterator over the elements of a DaryHeap.
  • An owning iterator over the elements of a DaryHeap.
  • An iterator over the elements of a DaryHeap.
  • Structure wrapping a mutable reference to the greatest item on a DaryHeap.

Type Definitions