Skip to main content

Module state

Module state 

Source
Expand description

§State Module

Provides state tracking primitives for the spider-lib framework.

§Overview

This module offers two categories of state management:

  1. Crawler Internal State: CrawlerState for tracking operational metrics
  2. Thread-Safe Primitives: Ready-to-use types for building custom Spider state

§Thread-Safe Primitives

The following types are designed for building custom Spider state structures with safe concurrent access:

§Example

use spider_core::{Counter, VisitedUrls, CrawlerState};
use std::sync::Arc;

#[derive(Clone, Default)]
struct MySpiderState {
    page_count: Counter,
    visited_urls: VisitedUrls,
}

impl MySpiderState {
    fn increment_page_count(&self) {
        self.page_count.inc();
    }

    fn mark_url_visited(&self, url: String) {
        self.visited_urls.mark(url);
    }
}

Structs§

ConcurrentMap
A thread-safe key-value map using DashMap.
ConcurrentVec
A thread-safe vector using RwLock.
Counter
A thread-safe counter using atomic operations.
Counter64
A 64-bit thread-safe counter for large counts.
CrawlerState
Represents the shared state of the crawler’s various actors.
Flag
A thread-safe boolean flag.
StateAccessMetrics
Metrics for tracking state access patterns.
VisitedUrls
A thread-safe URL tracker using DashMap.