acme-disk-use
Disclaimer: This is alpha software. Interfaces and cache formats may change without notice.
A replacement for du that:
- Caches results of prior runs and invalidates the cache using comparison of a directory's
mtime - performs parallel scanning using
rayon
e.g. a directory of model outputs each writing its output to a new daily data directory
Features
- Caching: Aggregates disk usage stats at directory level and caches results so they can be reused on next invocation if no change to underlying data is found
- Cache Invalidation: Scans directories that have changed since last scan based on dir's
mtimeor under which a sub-directory was modified (no matter how nested) - Smart Deletion Detection: Prunes deleted directories from cache without full rescans
- Human-Readable Output: Automatically formats sizes in B, KB, MB, GB, or TB
- Flexible Cache Location: Configurable via environment variable or defaults to
~/.cache/acme-disk-use/
Design Principle
acme-disk-use exploits a write pattern where applications write immutable files into incrementally-created nested directories—to dramatically outperform du on repeated scans.
How It Works
Traditional tools like du traverse the entire directory tree on every invocation, stat-ing and summing every file regardless of whether anything changed. For large trees with hundreds of thousands of files, this becomes prohibitively expensive.
acme-disk-use takes a different approach:
- Per-Directory Caching: Computes and caches the total disk usage for each directory separately, storing these aggregates in a compact binary cache
- Smart Invalidation: On subsequent runs, checks each directory's modification time (mtime) and presence of new subdirectories to identify what has changed
- Selective Re-scanning: Only re-traverses directories that have been modified or contain new content, reusing cached totals for everything else
- Delta Merging: Combines the freshly computed sizes from changed directories with cached values from stable directories to produce the final total
Performance Impact
Because immutable-file workloads rarely modify old directories, the vast majority of the tree remains unchanged between scans. This means:
- Warm-cache runs skip full I/O and become dominated by fast metadata checks
- Only changed paths trigger actual file traversal
- Cached totals eliminate redundant work for stable subtrees
The result: acme-disk-use with a warm cache is ~10x faster than du on typical workloads (see benchmark results below), since it avoids re-reading files that haven't changed.
Installation
From crates.io (Recommended)
Install the latest stable version from crates.io:
From GitHub Release
Download pre-built binaries for your platform from the Releases page:
Linux (x86_64):
macOS (Intel):
macOS (Apple Silicon):
Windows:
Download acme-disk-use-windows-x86_64.exe from the releases page and add it to your PATH.
From Source
Clone the repository and build from source:
# Binary will be at target/release/acme-disk-use
Verify Installation
TODO
- Memory-mapped cache loading for instant startup
- Configurable parallel scanning threshold
- User picks to use logical file size or block size (like du does)
Usage
Basic Usage
Scan current directory:
Scan a specific directory:
Options
Show raw bytes instead of human-readable sizes:
Ignore cache and scan fresh:
Clean the cache:
Show help:
Configuration
Custom cache location:
Set the ACME_DISK_USE_CACHE environment variable:
Or use it inline:
ACME_DISK_USE_CACHE=/tmp/path/to/cache/
Default cache location:
- If
ACME_DISK_USE_CACHEis not set, defaults to~/.cache/acme-disk-useon Unix systems - Falls back to
./cache.binif home directory is not available
Examples
# Scan data directory with human-readable output
# Show exact byte count
# Force fresh scan without using cache
# Clear all cached data
Benchmark Results
Performance comparison scanning ~220,000 files (nested directory structure):
| Method | Avg Time (ms) | Notes |
|---|---|---|
| Rust (Warm Cache) | 36.06 | Instant result from cache |
| Rust (Cold Cache) | 4459.78 | Initial scan + cache write |
| du | 4861.26 | Standard traversal |
Note: Rust (warm cache) is ~135x faster than
duin this scenario.
Development
Cargo commands
Check for compile errors:
cargo check
Format files
cargo fmt
Build binaries
cargo build
Run binary
RUST_LOG=debug cargo run
Build documentation
cargo doc --open
Run tests
cargo test
Run benchmarks
Relies on criterion library
cargo bench
Profile application
Install samply: https://github.com/mstange/samply
cargo build --profile profiling
samply record target/profiling/acme-disk-use
Linting
Install clippy: rustup component add clippy
cargo clippy --all-targets --all-features -- -D warnings
Contributing
We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.
Quick Start:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes
- Run tests:
cargo test - Format code:
cargo fmt - Check lints:
cargo clippy --all-targets --all-features -- -D warnings - Commit and push
- Open a pull request against the
developbranch
CI/CD
This project uses GitHub Actions for continuous integration and deployment:
-
CI Pipeline (
ci.yml): Runs on every push todevelopand on pull requests- ✓ Code formatting check (
cargo fmt) - ✓ Linting with clippy (
cargo clippy) - ✓ Test suite on Linux, macOS, and Windows
- ✓ Code coverage reporting
- ✓ Code formatting check (
-
Release Pipeline (
release.yml): Triggered by version tags (e.g.,v0.1.0) onmainbranch- ✓ Validates version matches Cargo.toml
- ✓ Runs full CI checks
- ✓ Publishes to crates.io
- ✓ Builds binaries for multiple platforms
- ✓ Creates GitHub Release with binaries
Creating a Release:
# Update version in Cargo.toml and CHANGELOG.md
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.