parallel-disk-usage 0.23.0

Highly parallelized, blazing fast directory tree analyzer
Documentation
# Parallel Disk Usage (pdu)

[![Test](https://github.com/KSXGitHub/parallel-disk-usage/workflows/Test/badge.svg)](https://github.com/KSXGitHub/parallel-disk-usage/actions?query=workflow%3ATest)
[![Benchmark](https://github.com/KSXGitHub/parallel-disk-usage/actions/workflows/benchmark.yaml/badge.svg)](https://github.com/KSXGitHub/parallel-disk-usage/actions/workflows/benchmark.yaml)
[![Clippy](https://github.com/KSXGitHub/parallel-disk-usage/actions/workflows/clippy.yaml/badge.svg)](https://github.com/KSXGitHub/parallel-disk-usage/actions/workflows/clippy.yaml)
[![Code formatting](https://github.com/KSXGitHub/parallel-disk-usage/actions/workflows/fmt.yaml/badge.svg)](https://github.com/KSXGitHub/parallel-disk-usage/actions/workflows/fmt.yaml)
[![Crates.io Version](https://img.shields.io/crates/v/parallel-disk-usage?logo=rust)](https://crates.io/crates/parallel-disk-usage)

Highly parallelized, blazing fast directory tree analyzer.

## Description

`pdu` is a CLI program that renders a graphical chart for disk usages of files and directories, it is an alternative to [`dust`](https://github.com/bootandy/dust) and [`dutree`](https://github.com/nachoparker/dutree).

## Benchmark

The benchmark was generated by [a GitHub Workflow](https://github.com/KSXGitHub/parallel-disk-usage/blob/0.20.0/.github/workflows/deploy.yaml#L431-L601) and uploaded to the release page.

<details><summary>Programs</summary>

* `pdu` v0.20.0
* [`dust`](https://github.com/bootandy/dust) v1.2.1
* [`dua`](https://github.com/Byron/dua-cli) v2.30.1
* [`ncdu`](https://dev.yorhel.nl/ncdu)
* [`gdu`](https://github.com/dundee/gdu) v5.31.0
* `du`

</details>

<figure>
  <img src="https://ksxgithub.github.io/parallel-disk-usage-0.20.0-benchmarks/tmp.benchmark-report.competing.block-size.svg">
  <img src="https://ksxgithub.github.io/parallel-disk-usage-0.20.0-benchmarks/tmp.benchmark-report.competing.deduplicate-hardlinks.svg">
  <figcaption align="center">
    benchmark results
    <em>(lower is better)</em>
  </figcaption>
</figure>

[_(See more)_](https://github.com/KSXGitHub/parallel-disk-usage-0.20.0-benchmarks/blob/master/tmp.benchmark-report.CHARTS.md)

## Demo

![screenshot](https://user-images.githubusercontent.com/11488886/127254941-d1fb30d8-18e0-40ac-a212-bbd6463aa624.png)

[![asciicast of pdu command](https://asciinema.org/a/416663.svg)](https://asciinema.org/a/416663)

[![asciicast of pdu command on /usr](https://asciinema.org/a/416664.svg)](https://asciinema.org/a/416664)

## Features

* Very fast.
* Relative comparison of separate files.
* Extensible via the library crate or JSON interface.
* Unbiased regarding hardlinks: All hardlinks are treated as equally real.
* Optional hardlink detection and deduplication (would make `pdu` proportionally slower).
* Optional progress report (would make `pdu` slightly slower).
* Customize tree depth.
* Customize chart size.

## Limitations

* Ignorant of reflinks (from COW filesystems such as BTRFS and ZFS).
* Does not follow symbolic links.
* The runtime is optimized at the expense of binary size.

## Usage

See [USAGE.md](./USAGE.md) for the full help text.

## Development

### Prerequisites

* [`cargo`](https://github.com/rust-lang/cargo)

### Test

```sh
./test.sh && ./test.sh --release
```

<details><summary>
Environment Variables
</summary>

| name          | type              | default value | description                                     |
|---------------|-------------------|---------------|-------------------------------------------------|
| `FMT`         | `true` or `false` | `true`        | Whether to run `cargo fmt`                      |
| `LINT`        | `true` or `false` | `true`        | Whether to run `cargo clippy`                   |
| `DOC`         | `true` or `false` | `false`       | Whether to run `cargo doc`                      |
| `BUILD`       | `true` or `false` | `true`        | Whether to run `cargo build`                    |
| `TEST`        | `true` or `false` | `true`        | Whether to run `cargo test`                     |
| `BUILD_FLAGS` | string            | _(empty)_     | Space-separated list of flags for `cargo build` |
| `TEST_FLAGS`  | string            | _(empty)_     | Space-separated list of flags for `cargo test`  |
| `TEST_SKIP`   | string            | _(empty)_     | Space-separated list of test names to skip      |

</details>

### Run

```sh
./run pdu "${arguments[@]}"
```

* `"${arguments[@]}"`: List of arguments to pass to `pdu`.

### Build

#### Debug build

```sh
cargo build --bin pdu
```

The resulting executable is located at `target/debug/pdu`.

#### Release build

```sh
cargo build --bin pdu --release
```

The resulting executable is located at `target/release/pdu`.

### Update shell completion files

```sh
./generate-completions.sh
```

## Extending `parallel-disk-usage`

The [parallel-disk-usage crate](https://crates.io/crates/parallel-disk-usage) is both a binary crate and a library crate. If you desire features that `pdu` itself lacks (that is, after you have asked the maintainer(s) of `pdu` for the features but they refused), you may use the library crate to build a tool of your own. The documentation for the library crate can be found in [docs.rs](https://docs.rs/parallel-disk-usage).

Alternatively, the `pdu` command provides `--json-input` flag and `--json-output` flag. The `--json-output` flag converts disk usage data into JSON and the `--json-input` flag turns said JSON into visualization. These 2 flags allow integration with other CLI tools (via pipe, as per the UNIX philosophy).

Beware that the structure of the JSON tree differs depending on the number of file/directory names that were provided (as CLI arguments):
* If there are only 0 or 1 file/directory names, the name of the tree root would be a real path (either `.` or the provided name).
* If there are 2 or more file/directory names, the name of the tree root would be `(total)` (which is not a real path), and the provided names would correspond to the children of the tree root.

## Installation

### Any Desktop OS

#### From GitHub

Go to the [GitHub Release Page](https://github.com/KSXGitHub/parallel-disk-usage/releases) and download a binary.

> [!NOTE]
> Starting with version 0.23.0, every executable published to GitHub Releases ships with a build provenance attestation, so you can cryptographically verify that the binary was produced by this repository's deployment workflow rather than uploaded by hand. See [_How can I trust the release binaries?_](#how-can-i-trust-the-release-binaries) for the verification command.

#### From [crates.io](https://crates.io)

**Prerequisites:**
  * [`cargo`](https://github.com/rust-lang/cargo)

```sh
cargo install parallel-disk-usage --bin pdu
```

### Arch Linux

#### From the [Official Repository](https://archlinux.org/packages/extra/x86_64/parallel-disk-usage/)

```sh
sudo pacman -S parallel-disk-usage
```

<!-- #### From [Khải's Pacman Repository](https://github.com/KSXGitHub/pacman-repo)

Follow the [installation instruction](https://github.com/KSXGitHub/pacman-repo#installation) then run the following command:

```sh
sudo pacman -S parallel-disk-usage
``` -->

## Distributions

[![Packaging Status](https://repology.org/badge/vertical-allrepos/parallel-disk-usage.svg)](https://repology.org/project/parallel-disk-usage/versions)

## Frequently Asked Questions

### Is this project vibe-coded?

No. "Vibe coding" means letting AI do everything without human involvement. This project uses AI-assisted workflows with active human direction and reviews.

Using AI also does not mean poor quality. On the contrary, AI reviews have helped detect previously undetected bugs.

### How can I trust the release binaries?

Starting with version 0.23.0, every executable published to [GitHub Releases](https://github.com/KSXGitHub/parallel-disk-usage/releases) is accompanied by a [build provenance attestation](https://docs.github.com/en/actions/how-tos/secure-your-work/use-artifact-attestations/use-artifact-attestations). The attestation is cryptographically signed by [Sigstore](https://www.sigstore.dev/) — a public-good signing service operated by the Linux Foundation — and records that the binary was built by this repository's GitHub Actions deployment workflow from a specific commit. Because the signing happens inside GitHub's infrastructure via OIDC and the signatures are logged to Sigstore's public transparency log, the guarantee does not depend on trusting the maintainer's personal word: any tampered or manually uploaded binary would fail verification.

To verify a downloaded binary, install the [GitHub CLI](https://cli.github.com/) and run:

```sh
gh attestation verify downloaded-pdu --repo KSXGitHub/parallel-disk-usage
# note: replace `downloaded-pdu` with the filename you downloaded.
```

A successful run prints the signer workflow and confirms that the file's SHA-256 matches the attested digest. All attestations for this repository can also be browsed at the [Attestations page](https://github.com/KSXGitHub/parallel-disk-usage/attestations).

Binaries from releases older than 0.23.0 are not attested.

## Similar programs

* **CLI:**
  * `du`
  * [`dust`](https://github.com/bootandy/dust)
  * [`dutree`](https://github.com/nachoparker/dutree)
  * [`dua`](https://github.com/byron/dua-cli)
* **TUI:**
  * [`ncdu`](https://dev.yorhel.nl/ncdu)
  * [`gdu`](https://github.com/dundee/gdu)
  * [`godu`](https://github.com/viktomas/godu)
* **GUI:**
  * [GNOME's Disk Usage Analyzer, a.k.a. `baobab`](https://apps.gnome.org/Baobab/)
  * [Filelight](https://apps.kde.org/filelight/)

## License

[Apache 2.0](https://git.io/JGIAt) © [Hoàng Văn Khải](https://ksxgithub.github.io/).