crw-diff 0.13.4

Stateless change-tracking diff engine for the CRW web scraper
Documentation

Stateless change-tracking diff engine for CRW monitors.

Pure, synchronous, no I/O, no LLM. Given the current scrape (markdown + optionally extracted JSON) and a caller-supplied previous snapshot, it classifies the page (same / changed), computes the requested diff surfaces, and returns the current snapshot to persist as the next baseline.

Caller-supplied JSON invariant

current_json is the already-extracted structured JSON supplied by the orchestration layer. This crate NEVER extracts JSON itself and does not depend on crw-extract — the LLM/judge live upstream.

Mode-aware hashing

content_hash is the normalized-markdown hash in gitDiff/mixed mode, and the canonicalized tracked-JSON hash in json-only mode. The SaaS store-skip short-circuit keys off this hash.