# rusty-pdfgrep
[](https://crates.io/crates/rusty-pdfgrep)
[](https://docs.rs/rusty-pdfgrep)
[](#license)
Grep through PDF files. A Rust port of [Hans-Peter Deifel's `pdfgrep(1)`](https://pdfgrep.org) with `lopdf`-backed text extraction, pluggable regex engines (`regex` default + `fancy-regex` for `-P`), `--password` retry for encrypted PDFs, GNU-grep-compatible color output, and a typed library API.
Part of the [Rusty portfolio](https://jsh562.github.io/rusty-portfolio).
## Install
```sh
cargo install rusty-pdfgrep
# or, with prebuilt binaries:
cargo binstall rusty-pdfgrep
```
## Usage
```sh
# Search a single PDF
rusty-pdfgrep "experimental results" report.pdf
# Show page numbers
rusty-pdfgrep -n "force majeure" contract.pdf
# Recursive search across a directory tree
rusty-pdfgrep -r -i "compliance" ./contracts/
# Decrypt one or more passwords against an encrypted PDF
rusty-pdfgrep --password "pwd1" --password "pwd2" "term" enc.pdf
# Fixed-string vs PCRE (lookahead, backref)
rusty-pdfgrep -F "$1.50" prices.pdf
rusty-pdfgrep -P "foo(?=bar)" file.pdf
# Count matches per file
rusty-pdfgrep -c "TODO" *.pdf
# Numeric percentages via -l + dialog --gauge style scripts
rusty-pdfgrep -r -l "secret" ./reports/
```
## Cargo Features
| `cli` | yes | `clap` + `clap_complete` + `anyhow` + `termcolor` + `anstyle` + `walkdir` + `globset` |
Library consumers can use `rusty-pdfgrep = { version = "0.1", default-features = false }` to get the PDF text-extraction + regex engine + library API without any CLI dependencies. The pure-Rust foundation (`lopdf` + `regex` + `fancy-regex` + `thiserror`) remains.
## Library API
```rust,no_run
use rusty_pdfgrep::PdfGrepBuilder;
use std::path::Path;
let pdfgrep = PdfGrepBuilder::new()
.pattern("force majeure")
.case_insensitive(true)
.build()
.unwrap();
for result in pdfgrep.search_file(Path::new("contract.pdf")) {
let m = result.unwrap();
println!("{}:{}: {}", m.path.display(), m.page, m.text);
}
```
## Compatibility
`rusty-pdfgrep` has two modes:
- **Default** — clap-styled flag parser; rejects conflicting flag pairs at parse time; adds `--help`, `--version`, `completions` subcommand.
- **Strict** (`--strict`, env `RUSTY_PDFGREP_STRICT=1`, or argv[0] = `pdfgrep`/`pdfgrep-alias`) — byte-equal stderr against upstream v2.2.0 for documented diagnostics; last-wins flag resolution; no subcommands.
v0.1.0 excludes: `-w`/`--word-regexp` (not in upstream), `--password-list FILE` (upstream uses repeated `--password`), `-A`/`-B`/`-C` page-context, `--cache`, `--unac`, `-R` symlink-follow, `pdfium-render` backend.
The `-P`/`--perl-regexp` engine is `fancy-regex` instead of upstream's libpcre2 — pure-Rust, no C toolchain. Edge-case PCRE features (recursive patterns, callouts, conditional patterns) diverge; documented above.
**BREAKING-CHANGE vs upstream**: stdin is buffered into memory with a configurable cap (default 512 MiB). Upstream buffers unbounded — risks OOM on huge inputs.
## MSRV
Rust 1.85 (edition 2024). Re-verified against stable-minus-two policy at each release.
## License
Dual-licensed under [MIT](LICENSE) or [Apache-2.0](LICENSE-APACHE).