Module links

Module links 

Source
Expand description

Link extraction for halldyll-parser

This module handles:

  • Link extraction from anchor tags
  • URL normalization and resolution
  • Rel attribute parsing (nofollow, ugc, sponsored, etc.)
  • Internal/external link classification
  • Link deduplication

Structs§

LinkStats
Count links by type

Functions§

calculate_link_stats
Calculate link statistics
extract_link
Extract a single link from an anchor element
extract_links
Extract all links from an HTML document
filter_external_links
Get all external links
filter_followable_links
Get all followable links (not nofollow, not sponsored, not ugc)
filter_internal_links
Get all internal links
get_external_domains
Get unique domains from external links
is_nofollow
Check if rel indicates nofollow
is_sponsored
Check if rel indicates sponsored
is_ugc
Check if rel indicates user-generated content
normalize_url
Normalize a URL (remove fragments, trailing slashes for paths)
parse_rel_attribute
Parse rel attribute into LinkRel values
resolve_url
Resolve a relative URL to absolute