Modules§
- dns_
cache - DNS caching for improved latency
- etld
- eTLD+1 (Effective Top-Level Domain + 1) extraction using Public Suffix List
- retry
- robots
- robots_
enhanced - Enhanced robots.txt parser with caching and crawl-delay support
- ssrf_
protection - url_
rewrites - user_
agents
Functions§
- extract_
domain - Extract domain from URL
- is_
valid_ scrape_ url - Check if URL is valid for scraping
- normalize_
url - Parse and normalize URL (legacy function for backward compatibility)
- normalize_
url_ string - Normalize URL to prevent duplicates from trailing slashes, fragments, etc.