rusty-pdfgrep
Grep through PDF files. A Rust port of Hans-Peter Deifel's pdfgrep(1) with lopdf-backed text extraction, pluggable regex engines (regex default + fancy-regex for -P), --password retry for encrypted PDFs, GNU-grep-compatible color output, and a typed library API.
Part of the Rusty portfolio.
Install
# or, with prebuilt binaries:
Usage
# Search a single PDF
# Show page numbers
# Recursive search across a directory tree
# Decrypt one or more passwords against an encrypted PDF
# Fixed-string vs PCRE (lookahead, backref)
# Count matches per file
# Numeric percentages via -l + dialog --gauge style scripts
Cargo Features
| Feature | Default | What it gates |
|---|---|---|
cli |
yes | clap + clap_complete + anyhow + termcolor + anstyle + walkdir + globset |
Library consumers can use rusty-pdfgrep = { version = "0.1", default-features = false } to get the PDF text-extraction + regex engine + library API without any CLI dependencies. The pure-Rust foundation (lopdf + regex + fancy-regex + thiserror) remains.
Library API
use PdfGrepBuilder;
use Path;
let pdfgrep = new
.pattern
.case_insensitive
.build
.unwrap;
for result in pdfgrep.search_file
Compatibility
rusty-pdfgrep has two modes:
- Default — clap-styled flag parser; rejects conflicting flag pairs at parse time; adds
--help,--version,completionssubcommand. - Strict (
--strict, envRUSTY_PDFGREP_STRICT=1, or argv[0] =pdfgrep/pdfgrep-alias) — byte-equal stderr against upstream v2.2.0 for documented diagnostics; last-wins flag resolution; no subcommands.
v0.1.0 excludes: -w/--word-regexp (not in upstream), --password-list FILE (upstream uses repeated --password), -A/-B/-C page-context, --cache, --unac, -R symlink-follow, pdfium-render backend.
The -P/--perl-regexp engine is fancy-regex instead of upstream's libpcre2 — pure-Rust, no C toolchain. Edge-case PCRE features (recursive patterns, callouts, conditional patterns) diverge; documented above.
BREAKING-CHANGE vs upstream: stdin is buffered into memory with a configurable cap (default 512 MiB). Upstream buffers unbounded — risks OOM on huge inputs.
MSRV
Rust 1.85 (edition 2024). Re-verified against stable-minus-two policy at each release.
License
Dual-licensed under MIT or Apache-2.0.