Skip to main content

Module utils

Module utils 

Source

Modules§

dns_cache
DNS caching for improved latency
etld
eTLD+1 (Effective Top-Level Domain + 1) extraction using Public Suffix List
retry
robots
robots_enhanced
Enhanced robots.txt parser with caching and crawl-delay support
ssrf_protection
url_rewrites
user_agents

Functions§

extract_domain
Extract domain from URL
is_valid_scrape_url
Check if URL is valid for scraping
normalize_url
Parse and normalize URL (legacy function for backward compatibility)
normalize_url_string
Normalize URL to prevent duplicates from trailing slashes, fragments, etc.