acme-disk-use
Disclaimer: This is alpha software. Interfaces and cache formats may change without notice.
A replacement for du that:
- Caches results of prior runs and invalidates the cache using comparison of a directory's
mtime - performs parallel scanning using
rayon
e.g. a directory of model outputs each writing its output to a new daily data directory
Features
- Caching: Aggregates disk usage stats at directory level and caches results so they can be reused on next invocation if no change to underlying data is found
- Cache Invalidation: Scans directories that have changed since last scan based on dir's
mtimeor under which a sub-directory was modified (no matter how nested) - Smart Deletion Detection: Prunes deleted directories from cache without full rescans
- Human-Readable Output: Automatically formats sizes in B, KB, MB, GB, or TB
- Flexible Cache Location: Configurable via environment variable or defaults to
~/.cache/acme-disk-use/
Design Principle
acme-disk-use exploits a write pattern where applications write immutable files into incrementally-created nested directories—to dramatically outperform du on repeated scans.
How It Works
Traditional tools like du traverse the entire directory tree on every invocation, stat-ing and summing every file regardless of whether anything changed. For large trees with hundreds of thousands of files, this becomes prohibitively expensive.
acme-disk-use takes a different approach:
- Per-Directory Caching: Computes and caches the total disk usage for each directory separately, storing these aggregates in a compact binary cache
- Smart Invalidation: On subsequent runs, checks each directory's modification time (mtime) and presence of new subdirectories to identify what has changed
- Selective Re-scanning: Only re-traverses directories that have been modified or contain new content, reusing cached totals for everything else
- Delta Merging: Combines the freshly computed sizes from changed directories with cached values from stable directories to produce the final total
Performance Impact
Because immutable-file workloads rarely modify old directories, the vast majority of the tree remains unchanged between scans. This means:
- Warm-cache runs skip full I/O and become dominated by fast metadata checks
- Only changed paths trigger actual file traversal
- Cached totals eliminate redundant work for stable subtrees
The result: acme-disk-use with a warm cache is ~10x faster than du on typical workloads (see benchmark results below), since it avoids re-reading files that haven't changed.
Installation
From crates.io (Recommended)
Install the latest stable version from crates.io:
From GitHub Release
Download pre-built binaries for your platform from the Releases page:
Linux (x86_64):
macOS (Intel):
macOS (Apple Silicon):
Windows:
Download acme-disk-use-windows-x86_64.exe from the releases page and add it to your PATH.
From Source
Clone the repository and build from source:
# Binary will be at target/release/acme-disk-use
Verify Installation
TODO
- Memory-mapped cache loading for instant startup
- Configurable parallel scanning threshold
- User picks to use logical file size or block size (like du does)
Usage
Basic Usage
Scan current directory (output in 1K blocks like du):
Scan a specific directory:
Options (du-compatible)
Human-readable output (-h):
Show raw bytes (-b):
Summarize (-s):
Ignore cache and scan fresh:
Show timing statistics and file count:
Clean the cache:
Show help:
Cache Commands
Display an interactive TUI showing cached directory sizes (similar to ncdu):
Show a specific cached path:
Configuration
Custom cache location:
Set the ACME_DISK_USE_CACHE environment variable:
Or use it inline:
ACME_DISK_USE_CACHE=/tmp/path/to/cache/
Default cache location:
- If
ACME_DISK_USE_CACHEis not set, defaults to~/.cache/acme-disk-useon Unix systems - Falls back to
./cache.binif home directory is not available
Examples
# Scan data directory (default: 1K blocks like du)
# Human-readable output (like du -h)
# Show exact byte count (like du -b)
# Force fresh scan without using cache
# Clear all cached data
# View cached directory sizes in an interactive TUI
Benchmark Results
Performance comparison scanning ~220,000 files (nested directory structure):
| Method | Avg Time (ms) | Notes |
|---|---|---|
| Rust (Warm Cache) | 36.06 | Instant result from cache |
| Rust (Cold Cache) | 4459.78 | Initial scan + cache write |
| du | 4861.26 | Standard traversal |
Note: Rust (warm cache) is ~135x faster than
duin this scenario.
Development
Cargo commands
Check for compile errors:
cargo check
Format files
cargo fmt
Build binaries
cargo build
Run binary
RUST_LOG=debug cargo run
Build documentation
cargo doc --open
Run tests
cargo test
Run benchmarks
Relies on criterion library
cargo bench
Profile application
Install samply: https://github.com/mstange/samply
cargo build --profile profiling
samply record target/profiling/acme-disk-use
Linting
Install clippy: rustup component add clippy
cargo clippy --all-targets --all-features -- -D warnings
Contributing
We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.
Quick Start:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes
- Run tests:
cargo test - Format code:
cargo fmt - Check lints:
cargo clippy --all-targets --all-features -- -D warnings - Commit and push
- Open a pull request against the
mainbranch
CI/CD
This project uses GitHub Actions for continuous integration and deployment:
- Unified Pipeline (
pipeline.yml): Handles both CI and Releases- CI: Runs on every push to
mainand on pull requests- ✓ Code formatting check (
cargo fmt) - ✓ Linting with clippy (
cargo clippy) - ✓ Test suite on Linux and macOS
- ✓ Code formatting check (
- Release: Triggered by version tags (e.g.,
v0.1.0)- ✓ Validates version matches Cargo.toml
- ✓ Runs full CI checks
- ✓ Publishes to crates.io
- ✓ Builds binaries for multiple platforms
- ✓ Creates GitHub Release with binaries
- CI: Runs on every push to
Creating a Release:
# Update version in Cargo.toml and CHANGELOG.md
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.