Skip to main content

Module crawler

Module crawler 

Source
Expand description

Crawler infrastructure module

Technical implementations for web crawling:

  • HTTP client with rate limiting
  • Link extraction from HTML
  • Concurrent URL queue
  • Sitemap parsing (FASE 3)

Re-exports§

pub use http_client::create_rate_limited_client;
pub use http_client::fetch_url;
pub use link_extractor::normalize_url;
pub use sitemap_parser::SitemapConfig;
pub use sitemap_parser::SitemapError;
pub use sitemap_parser::SitemapParser;
pub use url_queue::UrlQueue;

Modules§

http_client
HTTP client with rate limiting
link_extractor
Link extraction from HTML
sitemap_parser
Sitemap Parser Module
url_queue
Concurrent URL queue for crawling