diff_core 0.1.6

Semantic PDF comparison engine for matching document blocks and reporting meaningful changes.
Documentation

diff_core

Semantic PDF diff engine for matching extracted document nodes.

diff_core compares two pdf_semantic::SemanticDocument values and produces a stable spdfdiff_types::DiffDocument. It focuses on semantic block matching, text hunks, move detection, layout-only changes, confidence scoring, and neutral severity defaults for PDF comparison reports.

What This Crate Provides

  • Exact matching through deterministic semantic anchors.
  • Ordered fuzzy matching inside unmatched exact-anchor windows.
  • Resource-bounded matching with deterministic fallback when matrix limits are exceeded.
  • Inserted, deleted, modified, moved, and layout-changed semantic changes.
  • Text hunks with token ranges for modified text.
  • Character-level fallback hunks for small non-numeric word replacements.
  • Structured layout evidence for page and bounding-box movement.
  • Default severity classification that does not emit legal/business Critical severity.

Pipeline Context

diff_core is the comparison stage:

old SemanticDocument + new SemanticDocument -> diff_core -> DiffDocument

Reports are generated by diff_report; this crate stays independent of JSON, Markdown, HTML, and CLI rendering.

Current Compatibility Boundary

Matching quality depends on the semantic nodes produced upstream. This crate can separate moved content and layout-only changes when text anchors and geometry support it, but it does not solve OCR, full visual diffing, arbitrary table-cell semantics, or legal/business classification. Domain-specific severity can be provided by a caller-supplied classifier.