Expand description
§scankit — walk + watch + filter directory trees.
scankit is the shared scanner that Tauri / Iced / native
desktop apps reach for when they need to enumerate user files.
Its job is small but easy to get wrong:
- Walk a directory tree (
walkdirunder the hood). - Skip what the user said to skip —
.DS_Store,node_modules,.git,*.log, anything matching the configured glob set. - Drop oversized files before you ever read them — a rogue 50 GB sqlite database shouldn’t take your indexer offline.
- (Future, behind
watchfeature) keep watching the tree and emit change events as files are added / modified / removed.
What scankit deliberately does NOT do:
- Parse files. Use
mdkitor bring your own.scankithands youScanEntrys and gets out of the way. - Schema extraction, search indexing, embedding generation.
Those are the layers that consume
scankit’s output. - PII redaction, secrets scanning. Privacy policy is the embedding application’s concern.
§Quick start
use scankit::{Scanner, ScanConfig};
use std::path::Path;
let scanner = Scanner::new(
ScanConfig::default()
.max_file_size_bytes(50 * 1024 * 1024) // 50 MB cap
.add_exclude("**/.git/**")?
.add_exclude("**/node_modules/**")?
.add_exclude("**/.DS_Store")?,
)?;
for result in scanner.walk(Path::new("/Users/me/Documents")) {
match result {
Ok(entry) => println!("{}: {} bytes", entry.path.display(), entry.size_bytes),
Err(e) => eprintln!("scan error: {e}"),
}
}§Why a separate crate
Every “index files on the user’s machine” project rebuilds the
same five hundred lines of walkdir-with-excludes-and-size-cap
glue, and every project gets it slightly wrong. scankit ships
it once, with the edge cases (symlink loops, permission denials,
mid-walk concurrent deletes) handled in one place.
§Stability commitment (v0.3+)
v0.3 marks the API stability candidate for 1.0. The following surface is committed to and will only change with a major version bump:
Scannerconstruction + dispatch —new,walk,scan(under thewatchfeature),config. Future trait methods land with default impls so existing callers don’t break.ScanConfigfield set + the builder methods (max_file_size_bytes,follow_symlinks,add_exclude). Marked#[non_exhaustive]so we can add fields without major bumps.ScanEntry, [ScanEvent],Errorstructs + enums. All#[non_exhaustive]for forward-compat — pattern-matchers must include a wildcard arm.- The lazy
Iterator<Item = Result<ScanEntry>>shape returned byScanner::walk. - The
Iterator<Item = ScanEvent>shape returned by [Scanner::scan] under thewatchfeature, including theInitial→InitialComplete→ live-events lifecycle. - Feature flag names:
walk,watch.
The following are implementation details and may change in minor versions:
- The internal layout of
Scanner/ScanWalkIter/ [ScanStream] (private fields, helper methods). - The exact threading model of [
Scanner::scan] (currently one short-lived initial-walk thread + thenotifywatcher’s own threads; could change asnotifyevolves). - The exact set of filesystem-event types translated to
[
ScanEvent] variants (notify itself is platform-specific and we follow upstream).
1.0 will be cut once the API is exercised by at least one downstream production user. Sery Link is the canonical integration target.
Structs§
- Scan
Config - Configuration for a
Scanner. Construct viaScanConfig::defaultthen layer on options with thewith_*/add_*builder methods, or build from struct literal during the same crate. - Scan
Entry - One file produced by a successful walk. Directories are not
surfaced —
Scannerrecurses into them silently. Symlinks are dereferenced whenScanConfig::follow_symlinksis true and emitted as the target file; otherwise they’re skipped. - Scan
Walk Iter walk - Iterator returned by
Scanner::walk. Yields oneResultper file emitted; lazy under the hood. - Scanner
walk - A configured scanner. Cheap to construct (the
GlobSetbuild is the only non-trivial work, ~µs for typical exclude lists).Send + Sync— share a singleScanneracross threads.
Enums§
- Error
- Errors that can arise during scanning.
Type Aliases§
- Result
- Result alias used across the crate.