Skip to main content

Crate rusty_pdfgrep

Crate rusty_pdfgrep 

Source
Expand description

§rusty-pdfgrep

A Rust port of Hans-Peter Deifel’s pdfgrep(1) — grep through PDF files using page-level text extraction and pluggable regex engines.

§Quick start

use rusty_pdfgrep::PdfGrepBuilder;
use std::path::Path;

let pdfgrep = PdfGrepBuilder::new()
    .pattern("force majeure")
    .case_insensitive(true)
    .build()
    .unwrap();

for result in pdfgrep.search_file(Path::new("contract.pdf")) {
    let m = result.unwrap();
    println!("{}:{}: {}", m.path.display(), m.page, m.text);
}

§Stability

Library and binary share a single crate version. lopdf is pinned to the 0.36 minor; regex + fancy-regex engines are SemVer-stable. The PdfGrepError and Match types are #[non_exhaustive] — downstream code MUST use a wildcard _ arm when matching.

Re-exports§

pub use error::PdfGrepError;

Modules§

engine
Pluggable regex engine for pattern matching (FR-001..FR-005, AD-005).
error
Public error type for the rusty-pdfgrep library API (FR-041).
pdf
PDF reading via lopdf (FR-024..FR-026, AD-004, AD-012, HINT-002/003).

Structs§

Match
A single matched occurrence in a PDF page (FR-040).
PageIterator
Lazy per-page iterator returned by PdfGrep::search_file.
PdfGrep
Configured pattern matcher. Construct via PdfGrepBuilder.
PdfGrepBuilder
Builder for PdfGrep (FR-039). All methods are independent and order-agnostic; password(...) appends to the retry list and is the only repeatable setter.