minifind
About
minifind is a minimal Unix find reimplementation in Rust, designed to list
directory entries as fast as possible. Filename or path matching is supported
via --name (glob) or --regex (regular expression) options, with optional
case-insensitive matching controlled by --case-insensitive. Results can be
narrowed further using --file-type to filter by entry type: b for block
device, c for character device, d for directory, p for named FIFO, f
for regular file, l for symlink, s for socket, or e for empty
file/directory. Both --name and --regex accept multiple patterns.
Results can be filtered further by metadata: size (--size),
modification/change/access time (--mtime/--ctime/--atime in days,
--mmin/--cmin/--amin in minutes, or relative to a reference file with
--newer/--anewer/--cnewer), permission bits (--perm, octal or symbolic,
with find's //-/exact semantics), owner (--uid/--gid/--user/--group,
or orphaned ids with --nouser/--nogroup), hard-link count (--links), inode
(--inum), and access checks (--readable/--writable/--executable). Paths
can also be matched as a whole-path glob (--path/--wholename) or by a
symlink's target (--lname). Traversal can be bounded by depth
(--min-depth/--max-depth), whole subtrees pruned by name (--exclude), and
the walk stopped after the first match (--quit) or N matches
(--max-results). Output can be NUL-terminated with --null (-print0) for
safe piping into xargs -0. Most flags also accept their find-style spellings
(-name, -type, -size, -perm, -newer, …).
By default, symlinks are not followed and filesystem boundaries are not
crossed. The thread count defaults to the number of available CPU cores. The
metadata predicates are the only ones that require a stat, and it is paid
lazily — only when such a predicate is set, and only for entries that pass the
cheaper name/type filters first.
On Linux, --idle runs the walk unobtrusively: the worker pool is placed in
the SCHED_IDLE CPU class and the IOPRIO_CLASS_IDLE I/O class, the whole
process is lowered to nice +19, and the thread count defaults to 2 (an explicit
--threads still wins). The main and output threads stay normally scheduled, so
results keep draining even while the walkers only run when the CPU and disk are
otherwise idle — handy for background scans on a busy machine.
Related projects
Other notable projects in this space:
- sharkdp/fd — a much more fully-featured
findalternative with excellent performance - LyonSyonII/hunt-rs — a similar high-performance-oriented tool
- BurntSushi/ripgrep — also home to the globset and ignore crates used by this project
- uutils/findutils — a Rust reimplementation of findutils intended as a drop-in replacement
Usage
minimal find reimplementation
Usage: minifind [OPTIONS] <PATH>...
Arguments:
<PATH>... Paths to traverse (must be existing directories)
Options:
-f, --follow-symlinks Follow symlinks [aliases: -L, -follow]
-o, --one-filesystem Do not cross mount points (default) [aliases: --xdev, -xdev, -mount]
--no-one-filesystem Cross mount points [alias: --cross-filesystem]
-x, --threads <N> Number of worker threads [default: logical CPU count]
-d, --max-depth <N> Maximum depth to traverse [alias: -maxdepth]
--min-depth <N> Minimum depth to emit (shallower entries are skipped) [alias: -mindepth]
-s, --max-scan-rate <N> Max directories scanned per second (0 = unlimited)
--max-results <N> Stop after the first N results (0 = unlimited)
-n, --name <GLOB> File-name globbing pattern (repeatable; conflicts with --regex) [aliases: -name; -iname adds -i]
-r, --regex <RE> Full-path regular expression (repeatable; conflicts with --name) [aliases: -regex; -iregex adds -i]
-i, --case-insensitive Case-insensitive glob/regex matching
-E, --exclude <GLOB> Exclude entries whose name matches GLOB; matched directories are pruned (repeatable)
-0, --null Terminate each path with NUL instead of newline [aliases: -print0, --print0]
-t, --file-type <TYPE> Filter matches by type (repeatable) [default: directory file symlink] [alias: -type]
values: empty, block-device, char-device, directory, pipe, file, socket, symlink
aliases: e, b, c, d, p, f, s, l
--empty Match empty files and directories (= --file-type empty) [alias: -empty]
--size <[+-]N(c|k|M|G|T)> Filter by size; unit required (c=bytes, k/M/G/T = 1024-based); +N greater, -N less [alias: -size]
--mtime, --ctime, --atime <[+-]N> Filter by modify/change/access time, in days [aliases: -mtime/-ctime/-atime]
--mmin, --cmin, --amin <[+-]N> Filter by modify/change/access time, in minutes [aliases: -mmin/-cmin/-amin]
--perm <[/-]MODE> Filter by permission bits, octal or symbolic; -MODE all set, /MODE any set, MODE exact [alias: -perm]
--uid, --gid <[+-]N> Filter by numeric owner/group id [aliases: -uid/-gid]
--user, --group <NAME> Filter by owner/group name (or numeric id) [aliases: -user/-group]
--links <[+-]N> Filter by hard-link count [alias: -links]
--inum <[+-]N> Filter by inode number [alias: -inum]
--newer, --anewer, --cnewer <FILE> Entry's m/a/c-time is newer than FILE's mtime [aliases: -newer/-anewer/-cnewer]
--nouser, --nogroup Owner uid/gid resolves to no passwd/group entry [aliases: -nouser/-nogroup]
--path, --wholename <GLOB> Glob over the full path (* crosses /) [aliases: -path/-wholename; -ipath/-iwholename add -i]
--lname <GLOB> Glob over a symlink's target [alias: -lname; -ilname adds -i]
--readable, --writable, --executable Filter by access (real uid/gid) [aliases: -readable/-writable/-executable]
--quit Stop after the first match (= --max-results 1) [alias: -quit]
--idle Run unobtrusively: idle CPU + I/O scheduling, nice +19, 2 threads (Linux)
-h, --help Print help
-V, --version Print version
Regular expressions
The --regex option uses Rust regex syntax,
which is similar to other engines but does not support look-around or
backreferences.
Glob expressions
The --name option uses Unix-style glob syntax.
minifind vs GNU find
Hardware: 4-core / 8-thread Intel Xeon E5-1630 v3 @ 3.70 GHz, 48 GB RAM.
Measured with the Criterion benchmark in benches/walk.rs
over a shallow clone of the mainline Linux kernel tree (100,871 entries across
6,201 directories, ~2.1 GB) with a warm page cache. Both minifind (defaults)
and GNU find run as subprocesses, so each pays process-startup cost; output
is discarded for both. 100 samples each:
walk_linux_kernel/minifind time: [26.634 ms 26.702 ms 26.779 ms]
walk_linux_kernel/find time: [92.463 ms 92.998 ms 93.565 ms]
So minifind walks the tree in ~26.7 ms vs ~93.0 ms — about 3.5× faster
(≈3.8M vs ≈1.1M entries/second). Reproduce with cargo bench --bench walk
(set BENCH_WALK_DIR=/path/to/tree to benchmark an existing checkout).
Why it is faster
- Parallel traversal. GNU
findwalks on a single thread;minifindfans out across all cores with its own work-stealing walker (one worker per core, minus one thread reserved for output), overlapping directory reads. On this 8-thread machine that accounts for most of the gap — the advantage scales with core count and shrinks toward parity on a 1–2 core host. - Purpose-built walker.
minifinduses its own walker (rawgetdents64viarustixon Unix,std::fselsewhere) rather than a general-purpose crate, so it carries no gitignore/hidden-file bookkeeping it does not need. - No extra
stat(2). File-type filtering uses thed_typealready returned bygetdents(2), avoiding a per-entrystatfor-type-style matching. - Batched, lock-light output. Matched entries are streamed to a dedicated output thread in batches (amortizing channel synchronization), then written straight into a 256 KB buffered writer with one copy per entry.
- Fast allocator.
mimallockeeps the unavoidable per-entry path allocations cheap.
The warm-cache setup isolates CPU and syscall efficiency rather than disk latency; on a cold cache both tools are bound by I/O and the gap narrows.