Nefaxer
The Demon could sit in a box among air molecules that were moving at all different random speeds, and sort out the fast molecules from the slow ones.
— Koteks on the Nefastis Machine, The Crying of Lot 49
Nefaxer is a high-performance directory indexing and change detection tool written in Rust. It walks directory trees in parallel, computes hashes of file contents, and stores metadata in a fast indexed database. Compare the current state of a directory against a previous snapshot to detect additions, deletions, and modifications.
Install
# library
# CLI (from crates.io)
# Source archive
# Download from: https://github.com/thicclatka/nefaxer/releases
Features
- Streaming pipeline — Walk thread sends paths over a bounded channel; metadata workers turn paths into entries; the main thread receives entries, optionally hashes large files (when
--check-hash), and writes batches to file. No full-tree buffering: walk, metadata, and write run concurrently. - Drive-adaptive walk — Serial walk (
walkdir) where the disk is the bottleneck, otherwise parallel (jwalk). Parallel metadata (worker threads); hashing when enabled is sequential in the receiver. - Drive-type detection (SSD / HDD / network) for automatic thread and writer-pool tuning
- WAL SQLite with batch inserts, optional in-memory index for small dirs (<10K files), writer pool
- Exclude patterns (
-e) for gitignore-like filtering - Strict mode (
--strict): fail on first permission/access error instead of skipping - Paranoid mode (
--paranoid, with-c): re-hash when hash matches but mtime/size differ (collision check) - FD limit capping (Unix): cap worker threads by
ulimit -nto avoid EMFILE
Usage
# Index a directory (creates/updates .nefaxer in DIR). Default.
# Compare to index and report added/removed/modified; do not write to the index
Options
| Option | Short | Description |
|---|---|---|
--db <DB> |
-d |
Path to index file. Default: .nefaxer in DIR |
--dry-run |
Compare only; report diff, do not update index | |
--list |
-l |
List each changed path. If total changes > 100, write to nefaxer.results instead of stdout |
--verbose |
-v |
Verbose output and progress bar |
--check-hash |
-c |
Compute Blake3 hash for files (slower, more accurate diff) |
--follow-links |
-f |
Follow symbolic links |
--mtime-window <SECS> |
-m |
Mtime tolerance in seconds (default: 0) |
--exclude <PATTERN> |
-e |
Exclude glob patterns (repeatable) |
--encrypt |
-x |
Encrypt the index database with SQLCipher. Prompts for passphrase (or use NEFAXER_DB_KEY / .env) |
--strict |
Fail on first permission/access error | |
--paranoid |
(with -c) Re-hash when hash matches but mtime/size differ |
Configuration file (CLI only)
When running the binary, you can put a .nefaxer.toml in the directory you index. Options from the file are used as defaults; command-line options override them.
[]
= ".nefaxer"
= true
= false
= ["node_modules", ".git"]
= false
= false
= 0
= false
= false
= false
Database schema
Index file (default .nefaxer, WAL mode):
(
path TEXT PRIMARY KEY,
mtime_ns INTEGER NOT NULL,
size INTEGER NOT NULL,
hash BLOB
);
(
root_path TEXT PRIMARY KEY,
data TEXT NOT NULL
);
Library
Use the crate for programmatic indexing and diffing. Single entry point; returns the current index and a diff. The API supports tuning (tuning_for_path) and streaming via callback.
Entry point
-
nefax_dir(root, opts, existing, on_entry)— Indexrootwithopts. ReturnsResult<(Nefax, Diff)>.existing—Nonefor a fresh run (diff = all added);Some(&nefax)to diff against a previous snapshot (e.g. aNefaxyou built from your own DB/table).on_entry—Nonefor batch (non-streaming);Some(|entry| { ... })to get each entry as it’s ready (streaming, e.g. for progress or forwarding to another pipeline). Callback runs on the consumer thread; keep it fast or send to a channel.
-
tuning_for_path(path, available_threads)— Returns(num_threads, drive_type, use_parallel_walk)so you can setNefaxOptsand skip drive detection.
Types
pub type Nefax = ;
Same shape as the .nefaxer DB. When you pass existing: Some(&nefax) from your own table, nefax_dir validates it internally (paths relative and non-empty, mtime_ns/size in valid ranges) and returns an error if invalid. You can call validate_nefax(&nefax) yourself for fail-early (e.g. right after loading from your DB).
NefaxOpts
Use NefaxOpts::default() and override as needed:
num_threads,drive_type,use_parallel_walk— set all three (e.g. fromtuning_for_path) to skip drive detectionwith_hash— compute Blake3 for filesfollow_links— follow symlinksexclude— glob patterns to skip (e.g.node_modules,*.log)mtime_window_ns— mtime tolerance (nanoseconds)strict— fail on first permission/access errorparanoid— re-hash when hash matches but mtime/size differ
Examples
use ;
use Path;
// Fresh index, batch: get (nefax, diff) with diff = all added
let = nefax_dir?;
// Diff against a previous snapshot: build Nefax from your table (path → PathMeta), pass as existing.
// Validation runs inside nefax_dir; optional: validate_nefax(&prior)? to fail early after loading.
// let prior: Nefax = /* your table → HashMap<PathBuf, PathMeta> */;
// let (nefax, diff) = nefax_dir(Path::new("/some/dir"), &opts, Some(&prior), None)?;
// Streaming: process each entry as it’s ready
let = nefax_dir?;
// Skip drive detection using tuning
let = tuning_for_path;
let opts = NefaxOpts ;
let = nefax_dir?;
License
Dual-licensed under MIT or Apache-2.0.