codewalk 0.1.0

Fast, security-aware file tree walker — gitignore, binary detection, memmap2, parallel
Documentation

codewalk — Security-aware file system walker

License Tests Crates.io

Why

Every code security tool starts by enumerating files, but naive crawling is slow, noisy, and often reads binary files or vendor artifacts. codewalk gives you a predictable walker that respects .gitignore, supports extension and size filters, and exposes lazy content loading for both text and large files.

It is designed for scans where you care about throughput and signal: scanning only the files that matter, with minimal memory churn.

Quick Start

use codewalk::{CodeWalker, WalkConfig};
use std::path::Path;

fn main() -> std::io::Result<()> {
    let config = WalkConfig::default()
        .max_file_size(2 * 1024 * 1024)
        .skip_hidden(true)
        .skip_binary(true);

    let walker = CodeWalker::new(".", config);
    for entry in walker.walk() {
        println!("{} ({} bytes)", entry.path.display(), entry.size);
        if !entry.is_binary {
            let _text = entry.content_str()?;
        }
    }

    let _ = Path::new(".");
    Ok(())
}

Features

  • Skip hidden files/directories and respect .gitignore by default.
  • Per-file size and extension allow/deny filtering.
  • Binary detection + binary skip controls.
  • Memory-mapped reads for large files with bounded threshold.
  • Parallel walker mode for large mono-repos.

TOML Configuration

codewalk does not use TOML config files.

API Overview

  • WalkConfig: tune traversal behavior (max_file_size, include_extensions, exclude_dirs, ...).
  • CodeWalker: build with root + config, iterate with walk, walk_iter, or walk_parallel.
  • FileEntry: path/size/binary flags plus content and content_str methods.
  • FileSource: trait for custom file providers.
  • is_binary: helper for pre-checks outside walker logic.

Examples

1) Crawl only non-hidden Rust files and collect text sizes

use codewalk::{CodeWalker, FileSource, FileEntry, WalkConfig};
use std::collections::HashSet;

let mut config = WalkConfig::default().exclude_extensions(vec!["png".into(), "zip".into()].into_iter().collect());
let walker = CodeWalker::new(".", config);
for file in walker.walk() {
    if file.extension == "" {
        let bytes = file.content_str().unwrap_or_default();
        if bytes.contains("TODO") {
            println!("found todo: {}", file.path.display());
        }
    }
}

2) Stream entries in parallel channels

use codewalk::{CodeWalker, WalkConfig};

let walker = CodeWalker::new(".", WalkConfig::default());
let rx = walker.walk_parallel(4);
while let Ok(entry) = rx.recv() {
    println!("{}", entry.path.display());
}

3) Implement FileSource for a custom source

use codewalk::{FileEntry, FileSource};

struct ApiSource {
    files: Vec<FileEntry>,
}

impl FileSource for ApiSource {
    fn walk(&self) -> Vec<FileEntry> {
        self.files.clone()
    }

    fn count(&self) -> usize {
        self.files.len()
    }
}

Traits

codewalk defines FileSource; implement it when you need scanning over virtual trees (API artifacts, generated code, archive contents).

Related Crates

License

MIT, Corum Collective LLC

Docs: https://docs.rs/codewalk

Santh ecosystem: https://santh.io