Expand description
Content fingerprinting and change detection for halldyll-parser
This module handles:
- Content hashing for change detection
- Structural fingerprinting
- AMP page detection
- Content comparison
- Cache control hints
Structs§
- AmpInfo
- AMP (Accelerated Mobile Pages) information
- Cache
Hints - Cache hints extracted from the page
- Content
Fingerprint - Content fingerprint for change detection
Functions§
- content_
similarity - Get content similarity between two HTML strings (0.0 to 1.0)
- extract_
amp_ info - Extract AMP information from document
- extract_
cache_ hints - Extract cache hints from meta tags
- fingerprint_
document - Generate fingerprint from parsed document
- generate_
fingerprint - Generate content fingerprint from HTML
- get_
amp_ url - Get AMP URL if available
- has_
content_ changed - Check if content has changed between two HTML strings
- is_
amp_ page - Check if page is an AMP page
- quick_
hash - Quick hash of HTML content