Expand description
§Statistics Module
Collects and stores various metrics and statistics about the crawler’s operation.
§Overview
The StatCollector tracks important metrics throughout the crawling process,
including request counts, response statistics, item processing metrics, and
performance indicators. This data is essential for monitoring crawl progress,
diagnosing issues, and optimizing performance.
§Key Metrics Tracked
- Request Metrics: Enqueued, sent, succeeded, failed, retried, and dropped requests
- Response Metrics: Received, cached, and status code distributions
- Item Metrics: Scraped, processed, and dropped items
- Performance Metrics: Throughput, response times, and bandwidth usage
- Timing Metrics: Elapsed time and processing rates
§Features
- Thread-Safe: Uses atomic operations for concurrent metric updates
- Real-Time Monitoring: Provides live statistics during crawling
- Export Formats: Supports JSON and Markdown export formats
- Snapshot Capability: Captures consistent state for reporting
§Example
ⓘ
use spider_core::StatCollector;
let stats = StatCollector::new();
// During crawling, metrics are automatically updated
stats.increment_requests_sent();
stats.increment_items_scraped();
// Export statistics in various formats
println!("{}", stats.to_json_string_pretty().unwrap());
println!("{}", stats.to_markdown_string());Structs§
- Stat
Collector - Collects and stores various statistics about the crawler’s operation.