Expand description
§State Module
Provides state tracking primitives for the spider-lib framework.
§Overview
This module offers two categories of state management:
- Crawler Internal State:
CrawlerStatefor tracking operational metrics - Thread-Safe Primitives: Ready-to-use types for building custom Spider state
§Thread-Safe Primitives
The following types are designed for building custom Spider state structures with safe concurrent access:
Counter: Thread-safe atomic counterCounter64: 64-bit thread-safe counter for large countsFlag: Thread-safe boolean flagVisitedUrls: Thread-safe URL tracking with DashMapConcurrentMap<K, V>: Thread-safe key-value mapConcurrentVec<T>: Thread-safe dynamic vectorStateAccessMetrics: Metrics for tracking state access patterns
§Example
use spider_core::{Counter, VisitedUrls, CrawlerState};
use std::sync::Arc;
#[derive(Clone, Default)]
struct MySpiderState {
page_count: Counter,
visited_urls: VisitedUrls,
}
impl MySpiderState {
fn increment_page_count(&self) {
self.page_count.inc();
}
fn mark_url_visited(&self, url: String) {
self.visited_urls.mark(url);
}
}Structs§
- Concurrent
Map - A thread-safe key-value map using DashMap.
- Concurrent
Vec - A thread-safe vector using RwLock.
- Counter
- A thread-safe counter using atomic operations.
- Counter64
- A 64-bit thread-safe counter for large counts.
- Crawler
State - Represents the shared state of the crawler’s various actors.
- Flag
- A thread-safe boolean flag.
- State
Access Metrics - Metrics for tracking state access patterns.
- Visited
Urls - A thread-safe URL tracker using DashMap.