Skip to main content

scrape_multiple_with_limit

Function scrape_multiple_with_limit 

Source
pub async fn scrape_multiple_with_limit(
    client: &ClientWithMiddleware,
    urls: &[Url],
    config: &ScraperConfig,
) -> Result<Vec<ScrapedContent>>
Expand description

Scrape multiple URLs with concurrency control

Uses buffer_unordered to limit concurrent requests, preventing:

  • File descriptor exhaustion
  • HDD thrashing (for systems with mechanical drives)
  • Anti-bot detection (DDoS-like patterns)

Following config-externalize: Concurrency is configurable via ScraperConfig. Following async-concurrency-limit: Uses buffer_unordered for concurrency control.

§Arguments

  • client - HTTP client with retry middleware
  • urls - URLs to scrape
  • config - Scraper configuration

§Returns

  • Vec<ScrapedContent> - All successfully scraped content

§Note

Failed URLs are logged but don’t stop the entire batch.