weighted_path 0.6.0

A Rust library for finding shortest paths in weighted graphs using Dijkstra's algorithm with multiple heap implementations
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
# Weighted Path

A Rust library for finding shortest paths in weighted graphs using Dijkstra's algorithm with multiple heap implementations. Includes a command-line tool for convenience.

## Description

This library provides multiple implementations of Dijkstra's shortest path algorithm, allowing you to choose the optimal heap data structure for your use case. The library supports both directed and undirected graphs with positive edge weights.

## Library Usage

Add this to your `Cargo.toml`:

```toml
[dependencies]
weighted_path = "0.5"
```

### Basic Example

```rust
use weighted_path::dijkstra;

// Parse graph from lines
let lines = vec![
    "4",
    "A", "B", "C", "D",
    "A|B|2",
    "C|B|11",
    "C|D|3",
    "B|D|2",
];

// Find shortest path using binary heap (default)
let (path, distance) = dijkstra::find_shortest_path(lines.clone())
    .expect("Failed to parse graph");

println!("Path: {}, Distance: {:?}", path, distance);
// Output: Path: A-B-D, Distance: Some(4)
```

### Using Different Heap Implementations

```rust
use weighted_path::dijkstra;

let lines = vec![/* ... */];

// Binary heap (default, good general-purpose choice)
let (path, distance) = dijkstra::find_shortest_path(lines.clone())?;

// Fibonacci heap (faster for dense graphs)
let (path, distance) = dijkstra::find_shortest_path_fibonacci(lines.clone())?;

// Unsafe Fibonacci heap (fastest, uses unsafe code)
let (path, distance) = dijkstra::find_shortest_path_fibonacci_unsafe(lines.clone())?;

// Pairing heap (often faster than Fibonacci in practice)
let (path, distance) = dijkstra::find_shortest_path_pairing(lines.clone())?;

// Radix heap (excellent for integer weights)
let (path, distance) = dijkstra::find_shortest_path_radix(lines.clone())?;

// Dial's algorithm (optimal for small integer weights)
let (path, distance) = dijkstra::find_shortest_path_dial(lines.clone())?;
```

### Parsing and Using the Low-Level API

For more control, you can parse the graph yourself and then call the underlying Dijkstra functions:

```rust
use weighted_path::dijkstra;

// Start with graph in string format
let graph_str = "4
A
B
C
D
A|B|2
C|B|11
C|D|3
B|D|2";

let lines: Vec<&str> = graph_str.lines().collect();

// Parse the graph (bidirectional/undirected)
let parsed = dijkstra::parse_graph(&lines, true)
    .expect("Failed to parse graph");

// Now you have access to:
// - parsed.graph: adjacency list Vec<Vec<(usize, u32)>>
// - parsed.nodes_reverse: HashMap<usize, &str> mapping indices to node names

// Find shortest path from first node (index 0) to last node (index 3)
let (path_indices, distance) = dijkstra::dijkstra_binary(0, 3, &parsed.graph);

// Convert path indices back to node names
let path_names: Vec<&str> = path_indices
    .iter()
    .map(|&idx| *parsed.nodes_reverse.get(&idx).unwrap())
    .collect();

println!("Path: {:?}, Distance: {:?}", path_names, distance);
// Output: Path: ["A", "B", "D"], Distance: Some(4)
```

### Low-Level API with Pre-built Graph

You can also use the heap-specific Dijkstra functions directly with a pre-built adjacency list, e.g. using the `dijkstra_binary` function:

```rust
use weighted_path::dijkstra;

// Build adjacency list: graph[u] contains (v, weight) edges
let graph = vec![
    vec![(1, 2), (2, 5)],  // Node 0 -> Node 1 (weight 2), Node 0 -> Node 2 (weight 5)
    vec![(2, 1), (3, 3)],  // Node 1 -> Node 2 (weight 1), Node 1 -> Node 3 (weight 3)
    vec![(3, 2)],          // Node 2 -> Node 3 (weight 2)
    vec![],                 // Node 3 (no outgoing edges)
];

// Find shortest path from node 0 to node 3
let (path, distance) = dijkstra::dijkstra_binary(0, 3, &graph);
// path: [0, 1, 3]
// distance: Some(5)
```

### Directed Graphs

```rust
use weighted_path::dijkstra;

let lines = vec![/* ... */];

// Process as directed graph (edges are one-way only)
let (path, distance) = dijkstra::find_shortest_path_directed(lines, false)?;

// Process as undirected graph (edges are bidirectional)
let (path, distance) = dijkstra::find_shortest_path_directed(lines, true)?;
```

### Generic Dijkstra Function

For maximum flexibility, use the generic `dijkstra` function with any heap implementing the `PriorityQueue` trait:

```rust
use weighted_path::dijkstra;
use weighted_path::radix::RadixHeap;

// Build adjacency list: graph[u] contains (v, weight) edges
let graph = vec![
    vec![(1, 2), (2, 5)],  // Node 0 -> Node 1 (weight 2), Node 0 -> Node 2 (weight 5)
    vec![(2, 1), (3, 3)],  // Node 1 -> Node 2 (weight 1), Node 1 -> Node 3 (weight 3)
    vec![(3, 2)],          // Node 2 -> Node 3 (weight 2)
    vec![],                 // Node 3 (no outgoing edges)
];

// Use generic dijkstra with a RadixHeap
let mut heap = RadixHeap::new();
let (path, distance) = dijkstra::dijkstra(0, 3, &graph, heap);

assert_eq!(path, vec![0, 1, 3]);
assert_eq!(distance, Some(5));
```

## Command-Line Tool

The library includes a command-line binary for convenience and experimentation:

### Building the Binary

```bash
cargo build --release
```

### Running the Binary

```bash
cargo run --bin weighted_path <input_file>
```

Or using the compiled binary:

```bash
./target/release/weighted_path <input_file>
```

By default, the binary uses the **binary heap** implementation of Dijkstra.

**Note:** The command-line tool is primarily intended for testing, benchmarking, and experimentation. For production use, prefer the library API shown above.

### Selecting the Dijkstra / heap implementation (lab mode)

The binary exposes a "lab mode" flag that lets you choose which
underlying Dijkstra / heap implementation to use. This is mainly intended for
experimentation, benchmarking, and regression checks; for typical one-off runs
you will not notice a visible difference, because **file I/O and graph parsing
dominate the total runtime**.

Usage:

```bash
weighted_path [--heap <binary|bin|fib|fib-unsafe|pairing|pair|radix|dial>] <input_file>
```

- `binary` / `bin`: Standard binary-heap Dijkstra (default).
- `fib`: Fibonacci heap (`dijkstra_fibonacci`, `Rc<RefCell>`-based).
- `fib-unsafe`: Unsafe Fibonacci heap (`dijkstra_fibonacci_unsafe`, raw pointers).
- `pairing` / `pair`: Pairing heap (`dijkstra_pairing`).
- `radix`: Radix heap (`dijkstra_radix`).
- `dial`: Dial's algorithm (`dijkstra_dial`, bucket-based).

Examples:

```bash
# Default (binary heap)
weighted_path graph.txt

# Explicit binary heap
weighted_path --heap binary graph.txt

# Fibonacci heap
weighted_path --heap fib graph.txt

# Unsafe Fibonacci heap
weighted_path --heap fib-unsafe graph.txt

# Pairing heap
weighted_path --heap pairing graph.txt

# Radix heap
weighted_path --heap radix graph.txt

# Dial's algorithm
weighted_path --heap dial graph.txt
```

## Input Format

The input file should follow this format:

1. **First line**: Number of nodes (N) as a positive integer.
2. **Next N lines**: Node names (one per line).
3. **Remaining lines**: Edges in the format `node1|node2|weight`.
   - `node1` and `node2` are node names (must match nodes defined above).
   - `weight` is a positive integer representing the edge weight.
   - Edges are bidirectional.

### Example Input

```text
4
A
B
C
D
A|B|2
C|B|11
C|D|3
B|D|2
```

## Output Format

The library functions return a tuple `(path_string, distance)` where:

- `path_string` is a dash-separated string of node names (e.g., `"A-B-D"`), or `"-1"` if no path exists.
- `distance` is an `Option<u32>` containing the total shortest-path distance, or `None` if no path exists.

The command-line tool outputs the path string followed by the total distance in parentheses, or `-1` if no path exists.

### Example Output

```text
A-B-D (weight: 4)
```

This means the shortest path from node `A` to node `D` goes through node `B` and has total
weight `4`.

## Running Tests

```bash
cargo test --lib
```

This runs unit tests, including file-based test cases in `testdata/` (when present).

## Algorithm

The library implements Dijkstra's algorithm to find shortest paths:

1. Parse the graph and build an adjacency list.
2. Use Dijkstra's algorithm to find the shortest path from the source node to the target node and
   compute its total distance.
3. Return the path as a vector of node indices together with the total distance, or `None` if no
   path exists. The high-level helper functions format this as a dash-separated string.

## Edge Cases

- **Single node**: Returns the node name itself with distance 0 (e.g. `A (weight: 0)`).
- **No path exists**: Returns `-1`.
- **Empty input**: Returns `-1`.
- **Zero nodes**: Returns `-1`.

## Test Data

Test cases are located in the `testdata/` directory:

- `input0.txt` through `input18.txt`: Test input files.
- `output0.txt` through `output18.txt`: Expected output files.

## Error Handling

The library validates input and returns `Result` types with clear error messages for:

- Invalid number format.
- Missing nodes in edge definitions.
- Malformed edge lines.
- Duplicate node names.
- Empty or invalid graph structure.

The command-line tool additionally handles:

- Missing command-line arguments.
- File not found or unreadable.

## Benchmarking

The project includes benchmarking support using Criterion.rs to measure performance across different graph sizes, edge densities, and heap impls.

### Generating Test Graphs

A graph generator utility is included to create large graphs for benchmarking.

**Graph Generator Usage:**

```bash
cargo run --bin generate_graph <num_nodes> [edge_density] [output_file] [--directed]
```

- `num_nodes`: Number of nodes in the graph (required).
- `edge_density`: Probability of edge between nodes (0.0-1.0, default: 0.1).
- `output_file`: Output file path (default: stdout).
- `--directed`: Generate a directed graph (default: undirected/bidirectional).

**Examples:**

```bash
# Generate undirected graph with 1000 nodes
cargo run --bin generate_graph 1000 0.1 large_graph.txt

# Generate directed graph with 500 nodes
cargo run --bin generate_graph 500 0.1 directed_graph.txt --directed
```

**Note on Directed vs Undirected:**

- **Default behaviour**: By default, edges are treated as bidirectional (undirected). When an edge `A|B|w` is specified, both `A→B` and `B→A` are created with weight `w`.
- **Directed graphs**: The `find_shortest_path_directed` function accepts a `bidirectional` boolean parameter. When `false`, edges are one-way only (A→B exists, but B→A does not unless explicitly specified).
- **Input format**: The same input can be processed as either directed or undirected by using `find_shortest_path_directed(lines, bidirectional)`. If a directed graph is processed with `bidirectional=true`, reverse edges will be created.
- **Benchmarking**: This allows testing the same graph structure in both modes for fair performance comparison. The benchmark suite includes a `directed_vs_undirected` benchmark that tests the same graph with both settings.
- **Performance**: Directed graphs typically have fewer edges (only forward direction), while undirected graphs have symmetric edges. The same input processed as undirected will have more edges and may be slightly slower.

### Running Benchmarks

```bash
# Run all benchmarks
cargo bench --bench dijkstra_bench

# Run reference benchmark (quick performance check)
cargo bench --bench dijkstra_bench -- reference

# Run specific benchmark group
cargo bench --bench dijkstra_bench -- dijkstra_algorithm
```

The benchmarks test:

- **Graph parsing performance** across different graph sizes (10, 50, 100, 500, 1000 nodes).
- **Dijkstra algorithm performance** across different graph sizes (10, 50, 100, 500, 1000, 2000 nodes).
- **Edge density impact** on performance (0.01, 0.05, 0.1, 0.2, 0.5 density with 500 nodes).
- **Directed vs Undirected graphs** comparison: Same graph structure tested with `bidirectional=true` (undirected) and `bidirectional=false` (directed) for fair comparison.
- **Directed graph performance** across different sizes (100, 500, 1000 nodes).
- **Real-world graph performance** using actual test files.

Benchmark results are saved in `target/criterion/` and include HTML reports with detailed statistics and graphs.

### Performance Characteristics

The current implementation uses:

- **Adjacency list**: O(V + E) space complexity where V is vertices and E is edges (much better for sparse graphs).
- **Dijkstra's algorithm**: O((V + E) log V) time complexity with binary heap.
- **Binary heap priority queue**: Used for efficient minimum distance extraction.
- **Efficient priority queue**: Uses `Reverse` wrapper to convert BinaryHeap (max-heap) into a min-heap (zero-cost in release builds).

### Advanced Heap Implementations

**Multiple heap implementations are available:**

- **Binary heap** (`dijkstra_binary`): Standard implementation, good general-purpose choice.
- **Fibonacci heap** implementations:
  - `dijkstra_fibonacci`: `Rc<RefCell>`-based heap (memory-safe but slower).
  - `dijkstra_fibonacci_unsafe`: raw-pointer heap (fastest, but uses `unsafe`).
- **Pairing heap** (`dijkstra_pairing`): Simpler than Fibonacci, often faster in practice.
- **Radix heap** (`dijkstra_radix`): Specialized for non-decreasing integer keys, excellent for Dijkstra's algorithm.
- **Dial's algorithm** (`dijkstra_dial`): Bucket-based algorithm optimised for small integer edge weights (1..=100). Very efficient for dense graphs.

**Complexity:**

- Binary heap: O((V + E) log V).
- Fibonacci heap: O(E + V log V) amortized.
- Pairing heap: O(E + V log V) amortized.
- Radix heap: O(E + V log C) amortized, where C is the key range (for integer keys).
- Dial's algorithm: O(V + E + C), where C is the maximum distance. Optimal when C is small compared to V log V.

**Performance characteristics:**

- Fibonacci and Pairing heaps provide significant speedups on dense graphs.
- Pairing heap often outperforms Fibonacci heap in practice due to lower constant factors.
- The unsafe Fibonacci variant is fastest but requires careful memory management.
- Radix heap is particularly effective when edge weights are bounded integers (as in this implementation, where weights are `u32` in range 1..=100).
- Dial's algorithm excels with small integer edge weights and dense graphs. Its bucket-based approach avoids priority queue overhead, making it competitive with or faster than advanced heap structures for this use case.

For very large graphs (10,000+ nodes), the advanced heap implementations are recommended for best performance.

**Summary of benchmark results (indicative):**

These numbers come from the Criterion benchmark suite in this repository and are
intended as an order-of-magnitude guide rather than exact guarantees:

| **Graph type**        | **Nodes** | **Density** | **Binary heap (baseline)** | **Fib heap (`fib`)** | **Unsafe Fib heap (`fib-unsafe`)** | **Pairing heap (`pairing`)** | **Radix heap (`radix`)** | **Dial's algorithm (`dial`)** |
|-----------------------|-----------|-------------|----------------------------|----------------------|------------------------------------|------------------------------|--------------------------|-------------------------------|
| Sparse graph          | 500       | 0.1         || ~5–10× faster        | ~20× faster                        | ~10-15× faster               | ~50-60× faster           | ~50-60× faster                |
| Dense graph           | 1000      | 0.3         || >30× faster          | >100× faster                       | ~30-35× faster               | ~50× faster              | >150× faster                  |

Exact timings may vary by machine and compiler version; for precise numbers,
run the benchmarks locally:

```bash
cargo bench --bench dijkstra_bench -- reference
```

Criterion will generate detailed HTML reports (including plots) under
`target/criterion/`, which you can open in a browser for full performance
breakdowns.

## Module Layout

- `src/dijkstra/`: Core Dijkstra implementation and heap-specific wrappers.
  - `mod.rs`: Generic `dijkstra<Q: PriorityQueue>` function (works with any heap).
  - `heap_trait.rs`: `PriorityQueue` trait for abstracting over heap implementations.
  - `binary.rs`: Binary heap wrapper (`dijkstra_binary`).
  - `fib.rs`: Fibonacci heap wrapper (`dijkstra_fibonacci`).
  - `fib_unsafe.rs`: Unsafe Fibonacci heap wrapper (`dijkstra_fibonacci_unsafe`).
  - `pairing.rs`: Pairing heap wrapper (`dijkstra_pairing`).
  - `radix.rs`: Radix heap wrapper (`dijkstra_radix`).
  - `dial.rs`: Dial's algorithm wrapper (`dijkstra_dial`).
  - Also contains graph parsing, validation, and tests.
- `src/fibonacci/`: Fibonacci heap implementations.
  - `heap.rs`: `Rc<RefCell>`-based heap (`Node`, `FibonacciHeap`).
  - `heap_unsafe.rs`: unsafe heap (`UnsafeNode`, `UnsafeFibonacciHeap`).
- `src/pairing/`: Pairing heap implementation.
  - `heap.rs`: pairing heap (`Node`, `PairingHeap`).
- `src/radix/`: Radix heap implementation.
  - `heap.rs`: radix heap (`RadixHeap`, `RadixHandle`).
- `src/dial/`: Dial's algorithm implementation.
  - `heap.rs`: bucket-based heap (`DialHeap`, `DialHandle`).

**Architecture:**

The project uses a trait-based design where all heap types implement the `PriorityQueue` trait, allowing a single generic Dijkstra implementation to work with any heap type.

## Correctness, bugs, and security

This project aims to be **correct and well-tested**, but it is still an experimental “lab”
for comparing heap implementations.

- **Correctness**: All heaps are exercised by unit tests and cross-checked in benchmarks so
  that they produce the same shortest-path **distance** on the same graphs. The unsafe
  Fibonacci heap is kept in sync with the safe variant via these tests.
- **Bugs**: If you find a bug (wrong path, wrong distance, panic, etc.), please open an
  issue with:
  - the exact input file (or a minimal reproducer),
  - which heap you were using (`--heap` value),
  - the observed output and the expected one.
  This makes it much easier to reproduce and fix the problem.
- **Security**: The code is intended for offline graph experiments, not for processing
  untrusted input in production services. There is an `unsafe` Fibonacci heap implementation;
  it is tested and benchmarked, but should be treated with the usual care that any `unsafe`
  code deserves.

If you believe you have found a **security-relevant** issue, please avoid posting full
exploits publicly; instead, report the details privately (for example via the repository’s
security contact or a private issue if available).