Expand description
§rusty-pdfgrep
A Rust port of Hans-Peter Deifel’s pdfgrep(1) — grep through PDF files
using page-level text extraction and pluggable regex engines.
§Quick start
use rusty_pdfgrep::PdfGrepBuilder;
use std::path::Path;
let pdfgrep = PdfGrepBuilder::new()
.pattern("force majeure")
.case_insensitive(true)
.build()
.unwrap();
for result in pdfgrep.search_file(Path::new("contract.pdf")) {
let m = result.unwrap();
println!("{}:{}: {}", m.path.display(), m.page, m.text);
}§Stability
Library and binary share a single crate version. lopdf is pinned to the
0.36 minor; regex + fancy-regex engines are SemVer-stable. The
PdfGrepError and Match types are #[non_exhaustive] — downstream code
MUST use a wildcard _ arm when matching.
Re-exports§
pub use error::PdfGrepError;
Modules§
- engine
- Pluggable regex engine for pattern matching (FR-001..FR-005, AD-005).
- error
- Public error type for the rusty-pdfgrep library API (FR-041).
- PDF reading via
lopdf(FR-024..FR-026, AD-004, AD-012, HINT-002/003).
Structs§
- Match
- A single matched occurrence in a PDF page (FR-040).
- Page
Iterator - Lazy per-page iterator returned by
PdfGrep::search_file. - PdfGrep
- Configured pattern matcher. Construct via
PdfGrepBuilder. - PdfGrep
Builder - Builder for
PdfGrep(FR-039). All methods are independent and order-agnostic;password(...)appends to the retry list and is the only repeatable setter.