spdfdiff_cli 0.1.1-preview.7

Command-line interface for semantic PDF diffing.
spdfdiff_cli-0.1.1-preview.7 is not a library.

spdfdiff_cli

CLI entry point for spdfdiff diff, inspect, extract, and corpus.

Current command behavior:

  • diff: runs the vertical-slice semantic diff pipeline and emits JSON/Markdown/HTML. --fail-on-changes exits with code 1 when a completed diff contains changes.
  • inspect: parses a PDF with pdf_core and reports deterministic parser/object diagnostics plus simple tagged-structure and parent-tree summaries when present.
  • extract: runs parse/content/text/semantic extraction across parsed page content and reports extracted paragraph text, diagnostics summary, and simple tagged-structure summary when present.
  • corpus: scans a folder for .pdf files, runs parse/extract for each file, and writes stable aggregate totals (total, parsed, partial, failed), per-file status, extracted node counts, and diagnostic-code frequency.

The CLI compares image XObject payloads and selected annotation, attachment, outline, and metadata objects by deterministic hash and emits object-level changes in diff reports. It still emits stable unsupported-feature diagnostics for native vector graphic comparison, incomplete annotation/link semantics, and image-only PDFs that require OCR.