# codewalk — Security-aware file system walker
[](https://opensource.org/licenses/MIT) [](https://img.shields.io/badge/tests-26%20passing-brightgreen.svg) [](https://crates.io/crates/codewalk)
## Why
Every code security tool starts by enumerating files, but naive crawling is slow, noisy, and often reads binary files or vendor artifacts. `codewalk` gives you a predictable walker that respects `.gitignore`, supports extension and size filters, and exposes lazy content loading for both text and large files.
It is designed for scans where you care about throughput and signal: scanning only the files that matter, with minimal memory churn.
## Quick Start
```rust
use codewalk::{CodeWalker, WalkConfig};
use std::path::Path;
fn main() -> std::io::Result<()> {
let config = WalkConfig::default()
.max_file_size(2 * 1024 * 1024)
.skip_hidden(true)
.skip_binary(true);
let walker = CodeWalker::new(".", config);
for entry in walker.walk() {
println!("{} ({} bytes)", entry.path.display(), entry.size);
if !entry.is_binary {
let _text = entry.content_str()?;
}
}
let _ = Path::new(".");
Ok(())
}
```
## Features
- Skip hidden files/directories and respect `.gitignore` by default.
- Per-file size and extension allow/deny filtering.
- Binary detection + binary skip controls.
- Memory-mapped reads for large files with bounded threshold.
- Parallel walker mode for large mono-repos.
## TOML Configuration
`codewalk` does not use TOML config files.
## API Overview
- `WalkConfig`: tune traversal behavior (`max_file_size`, `include_extensions`, `exclude_dirs`, ...).
- `CodeWalker`: build with root + config, iterate with `walk`, `walk_iter`, or `walk_parallel`.
- `FileEntry`: path/size/binary flags plus `content` and `content_str` methods.
- `FileSource`: trait for custom file providers.
- `is_binary`: helper for pre-checks outside walker logic.
## Examples
### 1) Crawl only non-hidden Rust files and collect text sizes
```rust
use codewalk::{CodeWalker, FileSource, FileEntry, WalkConfig};
use std::collections::HashSet;
let mut config = WalkConfig::default().exclude_extensions(vec!["png".into(), "zip".into()].into_iter().collect());
let walker = CodeWalker::new(".", config);
for file in walker.walk() {
if file.extension == "" {
let bytes = file.content_str().unwrap_or_default();
if bytes.contains("TODO") {
println!("found todo: {}", file.path.display());
}
}
}
```
### 2) Stream entries in parallel channels
```rust
use codewalk::{CodeWalker, WalkConfig};
let walker = CodeWalker::new(".", WalkConfig::default());
let rx = walker.walk_parallel(4);
while let Ok(entry) = rx.recv() {
println!("{}", entry.path.display());
}
```
### 3) Implement `FileSource` for a custom source
```rust
use codewalk::{FileEntry, FileSource};
struct ApiSource {
files: Vec<FileEntry>,
}
impl FileSource for ApiSource {
fn walk(&self) -> Vec<FileEntry> {
self.files.clone()
}
fn count(&self) -> usize {
self.files.len()
}
}
```
## Traits
`codewalk` defines `FileSource`; implement it when you need scanning over virtual trees (API artifacts, generated code, archive contents).
## Related Crates
- [scanstate](https://docs.rs/scanstate)
- [secreport](https://docs.rs/secreport)
- [multimatch](https://docs.rs/multimatch)
## License
MIT, Corum Collective LLC
Docs: https://docs.rs/codewalk
Santh ecosystem: https://santh.io