pub struct Scheduler { /* private fields */ }Expand description
Manages the request queue and tracks visited URLs to prevent duplicate crawling.
The Scheduler is responsible for:
- Maintaining a queue of pending requests
- Tracking which URLs have been visited using a Bloom Filter and LRU cache
- Providing backpressure when too many requests are pending
- Supporting checkpoint-based state restoration
§Architecture
The scheduler runs as a separate async task and communicates via message passing. This design ensures thread-safe access without requiring explicit locks.
§Duplicate Detection
The scheduler uses a two-tier approach for duplicate detection:
- Bloom Filter: Fast, memory-efficient probabilistic check (may have false positives)
- LRU Cache: Definitive check with TTL-based eviction
Requests are first checked against the Bloom Filter. If it indicates a possible duplicate, the LRU cache is consulted for confirmation.
Implementations§
Source§impl Scheduler
impl Scheduler
pub fn new( _initial_state: Option<()>, ) -> (Arc<Scheduler>, AsyncReceiver<Request>)
pub async fn snapshot(&self) -> Result<(), SpiderError>
pub async fn enqueue_request(&self, request: Request) -> Result<(), SpiderError>
pub async fn shutdown(&self) -> Result<(), SpiderError>
pub async fn mark_visited(&self, fingerprint: String) -> Result<(), SpiderError>
pub async fn mark_visited_batch( &self, fingerprints: Vec<String>, ) -> Result<(), SpiderError>
pub fn is_visited(&self, fingerprint: &str) -> bool
pub fn should_enqueue(&self, request: &Request) -> bool
pub fn len(&self) -> usize
pub fn is_empty(&self) -> bool
pub fn is_idle(&self) -> bool
Auto Trait Implementations§
impl !Freeze for Scheduler
impl !RefUnwindSafe for Scheduler
impl Send for Scheduler
impl Sync for Scheduler
impl Unpin for Scheduler
impl UnsafeUnpin for Scheduler
impl !UnwindSafe for Scheduler
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more