Skip to main content

Crate scankit

Crate scankit 

Source
Expand description

§scankit — walk + watch + filter directory trees.

scankit is the shared scanner that Tauri / Iced / native desktop apps reach for when they need to enumerate user files. Its job is small but easy to get wrong:

  1. Walk a directory tree (walkdir under the hood).
  2. Skip what the user said to skip — .DS_Store, node_modules, .git, *.log, anything matching the configured glob set.
  3. Drop oversized files before you ever read them — a rogue 50 GB sqlite database shouldn’t take your indexer offline.
  4. (Future, behind watch feature) keep watching the tree and emit change events as files are added / modified / removed.

What scankit deliberately does NOT do:

  • Parse files. Use mdkit or bring your own. scankit hands you ScanEntrys and gets out of the way.
  • Schema extraction, search indexing, embedding generation. Those are the layers that consume scankit’s output.
  • PII redaction, secrets scanning. Privacy policy is the embedding application’s concern.

§Quick start

use scankit::{Scanner, ScanConfig};
use std::path::Path;

let scanner = Scanner::new(
    ScanConfig::default()
        .max_file_size_bytes(50 * 1024 * 1024) // 50 MB cap
        .add_exclude("**/.git/**")?
        .add_exclude("**/node_modules/**")?
        .add_exclude("**/.DS_Store")?,
)?;

for result in scanner.walk(Path::new("/Users/me/Documents")) {
    match result {
        Ok(entry) => println!("{}: {} bytes", entry.path.display(), entry.size_bytes),
        Err(e)    => eprintln!("scan error: {e}"),
    }
}

§Why a separate crate

Every “index files on the user’s machine” project rebuilds the same five hundred lines of walkdir-with-excludes-and-size-cap glue, and every project gets it slightly wrong. scankit ships it once, with the edge cases (symlink loops, permission denials, mid-walk concurrent deletes) handled in one place.

§Stability commitment (v0.3+)

v0.3 marks the API stability candidate for 1.0. The following surface is committed to and will only change with a major version bump:

  • Scanner construction + dispatch — new, walk, scan (under the watch feature), config. Future trait methods land with default impls so existing callers don’t break.
  • ScanConfig field set + the builder methods (max_file_size_bytes, follow_symlinks, add_exclude). Marked #[non_exhaustive] so we can add fields without major bumps.
  • ScanEntry, [ScanEvent], Error structs + enums. All #[non_exhaustive] for forward-compat — pattern-matchers must include a wildcard arm.
  • The lazy Iterator<Item = Result<ScanEntry>> shape returned by Scanner::walk.
  • The Iterator<Item = ScanEvent> shape returned by [Scanner::scan] under the watch feature, including the InitialInitialComplete → live-events lifecycle.
  • Feature flag names: walk, watch.

The following are implementation details and may change in minor versions:

  • The internal layout of Scanner / ScanWalkIter / [ScanStream] (private fields, helper methods).
  • The exact threading model of [Scanner::scan] (currently one short-lived initial-walk thread + the notify watcher’s own threads; could change as notify evolves).
  • The exact set of filesystem-event types translated to [ScanEvent] variants (notify itself is platform-specific and we follow upstream).

1.0 will be cut once the API is exercised by at least one downstream production user. Sery Link is the canonical integration target.

Structs§

ScanConfig
Configuration for a Scanner. Construct via ScanConfig::default then layer on options with the with_* / add_* builder methods, or build from struct literal during the same crate.
ScanEntry
One file produced by a successful walk. Directories are not surfaced — Scanner recurses into them silently. Symlinks are dereferenced when ScanConfig::follow_symlinks is true and emitted as the target file; otherwise they’re skipped.
ScanWalkIterwalk
Iterator returned by Scanner::walk. Yields one Result per file emitted; lazy under the hood.
Scannerwalk
A configured scanner. Cheap to construct (the GlobSet build is the only non-trivial work, ~µs for typical exclude lists). Send + Sync — share a single Scanner across threads.

Enums§

Error
Errors that can arise during scanning.

Type Aliases§

Result
Result alias used across the crate.