Expand description
Stateless change-tracking diff engine for CRW monitors.
Pure, synchronous, no I/O, no LLM. Given the current scrape (markdown +
optionally extracted JSON) and a caller-supplied previous snapshot, it
classifies the page (same / changed), computes the requested diff
surfaces, and returns the current snapshot to persist as the next baseline.
§Caller-supplied JSON invariant
current_json is the already-extracted structured JSON supplied by the
orchestration layer. This crate NEVER extracts JSON itself and does not
depend on crw-extract — the LLM/judge live upstream.
§Mode-aware hashing
content_hash is the normalized-markdown hash in gitDiff/mixed mode, and
the canonicalized tracked-JSON hash in json-only mode. The SaaS store-skip
short-circuit keys off this hash.
Modules§
- git_
diff - Git-diff (markdown) surface: a unified text diff plus a parse-diff-style
AST, BOTH derived from the same
similarop stream so they can never disagree. There is noparse-diffcrate in Rust; the AST is synthesized directly fromsimilar’sDiffOp/ChangeTagstream. - json_
diff - JSON-mode per-field diff. Walks two extractions and emits a map keyed by
field path (
plans[0].price, Firecrawl style) to{previous, current}pairs. Added fields haveprevious: null; removed fieldscurrent: null. - snapshot
- Markdown normalization + content hashing. Single source of truth for the
content_hashso cosmetic churn (trailing whitespace, blank-line runs, CRLF) never flips a page fromsametochanged.
Structs§
- Diff
Limits - Tunable limits for diff computation.
Constants§
- DEFAULT_
MAX_ DIFF_ CHANGES - Default cap on AST change-lines before the diff AST is truncated.
Functions§
- compute_
change_ tracking - Compute change tracking with default limits. See module docs for the caller-supplied-JSON invariant.
- compute_
change_ tracking_ with_ limits - Compute change tracking with explicit limits.