scankit
Walk + watch + filter directory trees. The shared scanner Tauri / Iced / native desktop apps reach for when they need to enumerate user files.
Status: v0.3 — API stability candidate for 1.0. Feature coverage closed in v0.2 (one-shot walking via the
walkfeature, continuous filesystem-event monitoring via thewatchfeature, both with shared exclude-glob + size-cap filters). v0.3 freezes the public surface — see the stability section below for what's locked in. v0.3.x will iterate on examples + cookbook docs. 1.0 ships once the API is exercised by at least one downstream production user.
Why this exists
Every "index files on the user's machine" project — RAG tools,
search apps, backup utilities, file watchers, document assistants —
rebuilds the same five hundred lines of walkdir-with-excludes-and-
size-cap-and-symlink-handling glue. Every project gets it slightly
wrong:
- Missed
**/.git/**in the exclude set, scanned 200K objects in a.packfile. - Forgot to cap file sizes, OOM'd on a 50 GB sqlite database the user accidentally dropped in their Documents folder.
- Followed a symlink loop and hung the indexer.
- Rebuilt the
GlobSetper-iteration, ate 30 % of CPU on glob compilation alone.
scankit ships these bits once, with the edge cases handled
in one place. It's deliberately lower-level than a full
indexer — it does not parse files, generate embeddings, or
persist anything. It hands you ScanEntrys and gets out of the
way. Pair it with mdkit for
documents → markdown, with calamine / csv for tabular files,
with whatever you like for the rest.
Quick start
use ;
use Path;
let scanner = new?;
for result in scanner.walk
# Ok::
Design principles
- Do one thing well. Walk + filter + emit
ScanEntry. Anything richer (parse, embed, persist) is the consuming application's job. Send + Synceverywhere. A singleScannershared across threads, a singleGlobSetbuilt once.- No surprises in the iterator. Filtered-out entries are
silently dropped. Errors come through as
Erritems in the stream — callers can log-and-continue or short-circuit. - Forward-compat defaults.
ScanConfigandScanEntryare#[non_exhaustive]so we can add fields (content hash, inode, per-entry metadata) without breaking downstream callers. - Honest dep budget.
walkdir+globset+thiserrorare the only required deps.notifyis gated behind thewatchfeature.
Feature flags
| Feature | Adds | Approx. cost |
|---|---|---|
walk (default) |
One-shot directory walking | ~250 KB compiled |
watch |
Continuous filesystem-event monitoring on top of an initial walk | ~500 KB compiled |
default |
walk |
~250 KB compiled |
Examples
Runnable example programs live in examples/:
walk.rs— walk a directory tree with conventional excludes (.git,node_modules,.DS_Store, build outputs) and a 50 MB size cap. Run with:watch.rs— continuous scan: initial walk + live filesystem events. Requires thewatchfeature. Run with:
Stability (v0.3+) {#stability-v03}
v0.3 is the API stability candidate for 1.0. The following surface is committed to and will only change with a major version bump:
Scannerconstruction + dispatch —new,walk,scan(under thewatchfeature),config. Future trait methods land with default impls so existing callers don't break.ScanConfigfield set + the builder methods (max_file_size_bytes,follow_symlinks,add_exclude).ScanEntry,ScanEvent,Errorfield/variant sets. All#[non_exhaustive]so we can grow them without major bumps. Pattern-matchers must include a wildcard arm.- The lazy
Iterator<Item = Result<ScanEntry>>shape fromScanner::walk. - The
Iterator<Item = ScanEvent>lifecycle fromScanner::scan(Initial→InitialComplete→ liveCreated/Modified/Deleted). - Feature flag names:
walk,watch.
The following are implementation details and may change in minor versions:
- Internal layout of
Scanner/ScanWalkIter/ScanStream(private fields, helper methods). - Threading model of
Scanner::scan(currently one short-lived initial-walk thread + thenotifywatcher's own threads). - Platform-specific event-translation rules (notify itself is platform-specific; we follow upstream).
1.0 will be cut once the API is exercised by at least one downstream production user.
License
Dual-licensed under MIT OR Apache 2.0
at your option. SPDX: MIT OR Apache-2.0.
Status & roadmap
- v0.1 — one-shot walking.
Scanner+ScanConfig+ScanEntry, exclude-glob and size-cap filters, symlink handling, lazy iterator. - v0.2 —
watchfeature.Scanner::scan→ScanStream(anIterator<Item = ScanEvent>). Initial walk + continuous filesystem-event monitoring vianotify, same exclude + size-cap filters apply to both.InitialCompletesentinel marks the boundary between the initial enumeration and live events. - v0.3 — API stability candidate. Stability commitments
doc in
lib.rs+ README.#[non_exhaustive]already on every public struct + enum (added incrementally v0.1 → v0.2);#[must_use]already on every constructor + builder + accessor. Documentation-only release — no API-shape changes. - v0.4 —
Renamedevent variant (consolidateDeleted+Createdpairs from notify's platform-specific rename shapes); extension-based dispatch helper. - v0.4 — audit pass + first stable trait release (1.0 candidate).
Issues, PRs, and design discussion welcome at https://github.com/seryai/scankit/issues.
Used by
scankit was extracted from the folder-scanner of Sery
Link, a privacy-respecting data network for the files on
your machines. If you use scankit in your project, please open
a PR to add yourself here.
Acknowledgements
walkdir—BurntSushi's battle-tested directory walker. Loop detection, permission handling, and Send-iterator semantics all come from there.globset— alsoBurntSushi's. The compiled multi-pattern glob matcher that makes our exclude set efficient even with hundreds of patterns.notify— the cross-platform filesystem-event crate that v0.2's watch loop will be built on.mdkit— sibling crate; scankit does "files → entries", mdkit does "documents → markdown".