rusty-pdfgrep 0.1.0

Grep through PDF files — a Rust port of Hans-Peter Deifel's `pdfgrep(1)` with lopdf-backed text extraction, regex + fancy-regex pluggable engines, --password retry for encrypted PDFs, GNU-grep-compatible color output, recursive walking with fnmatch include/exclude, and a typed library API.
Documentation

rusty-pdfgrep

crates.io docs.rs license: MIT OR Apache-2.0

Grep through PDF files. A Rust port of Hans-Peter Deifel's pdfgrep(1) with lopdf-backed text extraction, pluggable regex engines (regex default + fancy-regex for -P), --password retry for encrypted PDFs, GNU-grep-compatible color output, and a typed library API.

Part of the Rusty portfolio.

Install

cargo install rusty-pdfgrep
# or, with prebuilt binaries:
cargo binstall rusty-pdfgrep

Usage

# Search a single PDF
rusty-pdfgrep "experimental results" report.pdf

# Show page numbers
rusty-pdfgrep -n "force majeure" contract.pdf

# Recursive search across a directory tree
rusty-pdfgrep -r -i "compliance" ./contracts/

# Decrypt one or more passwords against an encrypted PDF
rusty-pdfgrep --password "pwd1" --password "pwd2" "term" enc.pdf

# Fixed-string vs PCRE (lookahead, backref)
rusty-pdfgrep -F "$1.50" prices.pdf
rusty-pdfgrep -P "foo(?=bar)" file.pdf

# Count matches per file
rusty-pdfgrep -c "TODO" *.pdf

# Numeric percentages via -l + dialog --gauge style scripts
rusty-pdfgrep -r -l "secret" ./reports/

Cargo Features

Feature Default What it gates
cli yes clap + clap_complete + anyhow + termcolor + anstyle + walkdir + globset

Library consumers can use rusty-pdfgrep = { version = "0.1", default-features = false } to get the PDF text-extraction + regex engine + library API without any CLI dependencies. The pure-Rust foundation (lopdf + regex + fancy-regex + thiserror) remains.

Library API

use rusty_pdfgrep::PdfGrepBuilder;
use std::path::Path;

let pdfgrep = PdfGrepBuilder::new()
    .pattern("force majeure")
    .case_insensitive(true)
    .build()
    .unwrap();

for result in pdfgrep.search_file(Path::new("contract.pdf")) {
    let m = result.unwrap();
    println!("{}:{}: {}", m.path.display(), m.page, m.text);
}

Compatibility

rusty-pdfgrep has two modes:

  • Default — clap-styled flag parser; rejects conflicting flag pairs at parse time; adds --help, --version, completions subcommand.
  • Strict (--strict, env RUSTY_PDFGREP_STRICT=1, or argv[0] = pdfgrep/pdfgrep-alias) — byte-equal stderr against upstream v2.2.0 for documented diagnostics; last-wins flag resolution; no subcommands.

v0.1.0 excludes: -w/--word-regexp (not in upstream), --password-list FILE (upstream uses repeated --password), -A/-B/-C page-context, --cache, --unac, -R symlink-follow, pdfium-render backend.

The -P/--perl-regexp engine is fancy-regex instead of upstream's libpcre2 — pure-Rust, no C toolchain. Edge-case PCRE features (recursive patterns, callouts, conditional patterns) diverge; documented above.

BREAKING-CHANGE vs upstream: stdin is buffered into memory with a configurable cap (default 512 MiB). Upstream buffers unbounded — risks OOM on huge inputs.

MSRV

Rust 1.85 (edition 2024). Re-verified against stable-minus-two policy at each release.

License

Dual-licensed under MIT or Apache-2.0.