codewalk 0.1.0

Fast, security-aware file tree walker — gitignore, binary detection, memmap2, parallel
Documentation
# codewalk — Security-aware file system walker

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Tests](https://img.shields.io/badge/tests-26%20passing-brightgreen.svg)](https://img.shields.io/badge/tests-26%20passing-brightgreen.svg) [![Crates.io](https://img.shields.io/crates/v/codewalk.svg)](https://crates.io/crates/codewalk)

## Why
Every code security tool starts by enumerating files, but naive crawling is slow, noisy, and often reads binary files or vendor artifacts. `codewalk` gives you a predictable walker that respects `.gitignore`, supports extension and size filters, and exposes lazy content loading for both text and large files.

It is designed for scans where you care about throughput and signal: scanning only the files that matter, with minimal memory churn.

## Quick Start
```rust
use codewalk::{CodeWalker, WalkConfig};
use std::path::Path;

fn main() -> std::io::Result<()> {
    let config = WalkConfig::default()
        .max_file_size(2 * 1024 * 1024)
        .skip_hidden(true)
        .skip_binary(true);

    let walker = CodeWalker::new(".", config);
    for entry in walker.walk() {
        println!("{} ({} bytes)", entry.path.display(), entry.size);
        if !entry.is_binary {
            let _text = entry.content_str()?;
        }
    }

    let _ = Path::new(".");
    Ok(())
}
```

## Features
- Skip hidden files/directories and respect `.gitignore` by default.
- Per-file size and extension allow/deny filtering.
- Binary detection + binary skip controls.
- Memory-mapped reads for large files with bounded threshold.
- Parallel walker mode for large mono-repos.

## TOML Configuration
`codewalk` does not use TOML config files.

## API Overview
- `WalkConfig`: tune traversal behavior (`max_file_size`, `include_extensions`, `exclude_dirs`, ...).
- `CodeWalker`: build with root + config, iterate with `walk`, `walk_iter`, or `walk_parallel`.
- `FileEntry`: path/size/binary flags plus `content` and `content_str` methods.
- `FileSource`: trait for custom file providers.
- `is_binary`: helper for pre-checks outside walker logic.

## Examples
### 1) Crawl only non-hidden Rust files and collect text sizes
```rust
use codewalk::{CodeWalker, FileSource, FileEntry, WalkConfig};
use std::collections::HashSet;

let mut config = WalkConfig::default().exclude_extensions(vec!["png".into(), "zip".into()].into_iter().collect());
let walker = CodeWalker::new(".", config);
for file in walker.walk() {
    if file.extension == "" {
        let bytes = file.content_str().unwrap_or_default();
        if bytes.contains("TODO") {
            println!("found todo: {}", file.path.display());
        }
    }
}
```

### 2) Stream entries in parallel channels
```rust
use codewalk::{CodeWalker, WalkConfig};

let walker = CodeWalker::new(".", WalkConfig::default());
let rx = walker.walk_parallel(4);
while let Ok(entry) = rx.recv() {
    println!("{}", entry.path.display());
}
```

### 3) Implement `FileSource` for a custom source
```rust
use codewalk::{FileEntry, FileSource};

struct ApiSource {
    files: Vec<FileEntry>,
}

impl FileSource for ApiSource {
    fn walk(&self) -> Vec<FileEntry> {
        self.files.clone()
    }

    fn count(&self) -> usize {
        self.files.len()
    }
}
```

## Traits
`codewalk` defines `FileSource`; implement it when you need scanning over virtual trees (API artifacts, generated code, archive contents).

## Related Crates
- [scanstate]https://docs.rs/scanstate
- [secreport]https://docs.rs/secreport
- [multimatch]https://docs.rs/multimatch

## License
MIT, Corum Collective LLC

Docs: https://docs.rs/codewalk

Santh ecosystem: https://santh.io