Module fingerprint

Module fingerprint 

Source
Expand description

Content fingerprinting and change detection for halldyll-parser

This module handles:

  • Content hashing for change detection
  • Structural fingerprinting
  • AMP page detection
  • Content comparison
  • Cache control hints

Structs§

AmpInfo
AMP (Accelerated Mobile Pages) information
CacheHints
Cache hints extracted from the page
ContentFingerprint
Content fingerprint for change detection

Functions§

content_similarity
Get content similarity between two HTML strings (0.0 to 1.0)
extract_amp_info
Extract AMP information from document
extract_cache_hints
Extract cache hints from meta tags
fingerprint_document
Generate fingerprint from parsed document
generate_fingerprint
Generate content fingerprint from HTML
get_amp_url
Get AMP URL if available
has_content_changed
Check if content has changed between two HTML strings
is_amp_page
Check if page is an AMP page
quick_hash
Quick hash of HTML content