Skip to main content

Module walker

Module walker 

Source
Expand description

Parallel file walker for Layer 0 static analysis.

Uses ignore::WalkParallel (same engine as ripgrep) for parallel, gitignore-aware directory traversal. Results stream to the caller via an mpsc channel so downstream parsing can start before the walk completes.

§Architecture

Walker::walk_channel()
    │
    ├── spawns std::thread (WalkParallel::visit is blocking/sync)
    │       │
    │       ├── VisitorBuilder::build() — one FileVisitor per worker thread
    │       │
    │       └── FileVisitor::visit() — per-entry filtering + local buffering
    │               │
    │               └── flush every FLUSH_THRESHOLD entries → mpsc::Sender
    │                   Drop flush handles tail entries
    │
    └── returns mpsc::Receiver<WalkedFile>  (parser consumes while walk runs)

Walker::walk() — thin wrapper: collect channel → sort → Vec<WalkedFile>

§Why thread-local buffering?

mpsc::Sender::send() acquires an internal lock on every call. With 8 threads and 80k files, 80k individual sends ≈ 16ms of contention overhead. Flushing every FLUSH_THRESHOLD entries reduces sends to ~2 500, cutting that overhead to ~500µs while still giving the receiver batches early enough for meaningful parse pipelining.

Structs§

WalkedFile
A single file discovered by the walker.
Walker
Parallel, gitignore-aware file walker.

Enums§

Language
Programming language detected from file extension.

Constants§

DEFAULT_MAX_FILE_SIZE
Default maximum file size accepted by the walker (bytes). Files larger than this are silently skipped — they are almost always generated artefacts (minified JS, compiled output) not worth parsing.

Functions§

detect_language
Detect programming language from file extension.