codewalk 0.2.0

Fast, security-aware file tree walker — gitignore, binary detection, safe bounded reads, parallel
Documentation

codewalk

Walk a directory tree. Skip binaries, respect .gitignore, memory-map large files. Parallel mode for scanning big codebases.

use codewalk::{CodeWalker, WalkConfig};

let walker = CodeWalker::new("/path/to/repo", WalkConfig::default());
for entry in walker.walk() {
    println!("{} ({} bytes)", entry.path.display(), entry.size);
    let content = entry.content_str().unwrap();
    // scan the content
}

What the defaults do

Out of the box, codewalk skips:

  • Binary files (detected by magic bytes, not just extension)
  • Hidden files and directories
  • Common junk directories: node_modules, .git, target, pycache, vendor, .venv, Pods
  • Files over 10MB

It respects .gitignore rules automatically.

Why not walkdir or ignore?

walkdir gives you paths. ignore gives you paths respecting gitignore. Neither reads file content, detects binary files by magic bytes, or memory-maps large files. If you're building a security scanner or code analyzer, you need all three: walk, skip binaries, read content efficiently. codewalk does that in one call. Without it you're stacking walkdir + a binary detector + a gitignore parser + an mmap wrapper + size limits. codewalk is that stack, tested and ready.

Configuration

Override any default via struct fields or TOML:

max_file_size = 1048576
skip_binary = true
skip_hidden = false
respect_gitignore = true
follow_symlinks = false
include_extensions = ["rs", "py", "js"]
exclude_dirs = ["node_modules", ".git"]
let config = WalkConfig::from_toml(r#"
    include_extensions = ["rs", "py"]
    max_file_size = 5242880
"#).unwrap();

Parallel walking

For large codebases, walk on multiple threads:

let rx = walker.walk_parallel(4);
for entry in rx {
    // entries arrive as they're discovered
}

Memory-mapped reading

Files above 64KB (configurable) are memory-mapped instead of read into a Vec. Below that threshold, regular read is faster.

let content = entry.content().unwrap();
let bytes: &[u8] = content.as_bytes();

Binary detection

Checks file extension first (fast), then magic bytes if needed. Recognizes ELF, PE, Mach-O, WASM, ZIP, images, audio, databases, and more.

Contributing

Pull requests are welcome. There is no such thing as a perfect crate. If you find a bug, a better API, or just a rough edge, open a PR. We review quickly.

License

MIT. Copyright 2026 CORUM COLLECTIVE LLC.

crates.io docs.rs