Expand description
Link extraction for halldyll-parser
This module handles:
- Link extraction from anchor tags
- URL normalization and resolution
- Rel attribute parsing (nofollow, ugc, sponsored, etc.)
- Internal/external link classification
- Link deduplication
Structs§
- Link
Stats - Count links by type
Functions§
- calculate_
link_ stats - Calculate link statistics
- extract_
link - Extract a single link from an anchor element
- extract_
links - Extract all links from an HTML document
- filter_
external_ links - Get all external links
- filter_
followable_ links - Get all followable links (not nofollow, not sponsored, not ugc)
- filter_
internal_ links - Get all internal links
- get_
external_ domains - Get unique domains from external links
- is_
nofollow - Check if rel indicates nofollow
- is_
sponsored - Check if rel indicates sponsored
- is_ugc
- Check if rel indicates user-generated content
- normalize_
url - Normalize a URL (remove fragments, trailing slashes for paths)
- parse_
rel_ attribute - Parse rel attribute into LinkRel values
- resolve_
url - Resolve a relative URL to absolute