spdfdiff_cli-0.1.2 is not a library.
spdfdiff_cli
CLI entry point for spdfdiff diff, inspect, extract, and corpus.
Current command behavior:
diff: runs the vertical-slice semantic diff pipeline and emits JSON/Markdown/HTML.--fail-on-changesexits with code1when a completed diff contains changes.inspect: parses a PDF withpdf_coreand reports deterministic parser/object diagnostics plus simple tagged-structure and parent-tree summaries when present.extract: runs parse/content/text/semantic extraction across parsed page content and reports extracted paragraph text, simple aligned text-grid table row/cell evidence, diagnostics summary, and simple tagged-structure summary when present.corpus: scans a folder for.pdffiles, runs parse/extract for each file, and writes stable aggregate totals (total,parsed,partial,failed), per-file status, extracted node counts, and diagnostic-code frequency. With--manifest <json>, it also checks required files, runs declared diff pairs, emits diff diagnostic counts, and reports a deterministic release gate; with--fail-on-gate, a failed gate exits with code1.
The CLI compares image XObject payloads and selected annotation, attachment,
outline, and metadata objects by deterministic hash and emits object-level
changes in diff reports. It still emits stable unsupported-feature diagnostics
for native vector graphic comparison and incomplete annotation/link semantics.
For image-only PDFs, the CLI can OCR supported high-contrast image XObjects with
an external engine. Set SPDFDIFF_OCR_COMMAND to a command that accepts a PPM
path and writes recognized text to stdout, or install tesseract for the
default tesseract <image> stdout --psm 6 adapter.