Skip to main content

spider_util/
lib.rs

1//! # spider-util
2//!
3//! Provides utility types, traits, and implementations for the `spider-lib` framework.
4//!
5//! ## Overview
6//!
7//! The `spider-util` crate contains fundamental data structures, error types,
8//! and utility functions that are shared across all components of the spider
9//! framework. This crate serves as the common foundation for all other spider
10//! crates, providing the basic building blocks for web scraping operations.
11//!
12//! ## Key Components
13//!
14//! - **Request**: Represents an HTTP request with URL, method, headers, and body
15//! - **Response**: Represents an HTTP response with status, headers, and body
16//! - **ScrapedItem**: Trait and derive macro for defining data structures to hold scraped data
17//! - **Error Handling**: Comprehensive error types for all operations
18//! - **Bloom Filter**: Efficient probabilistic data structure for duplicate detection
19//! - **Utilities**: Helper functions and extensions for common operations
20//!
21//! ## Architecture
22//!
23//! This crate is designed to be lightweight and reusable, containing only the
24//! essential types and utilities needed by other spider components. It has minimal
25//! external dependencies to ensure stability and compatibility.
26//!
27//! ## Example
28//!
29//! ```rust,ignore
30//! use spider_util::{request::Request, response::Response, item::ScrapedItem};
31//! use url::Url;
32//!
33//! // Create a request
34//! let url = Url::parse("https://example.com").unwrap();
35//! let request = Request::new(url);
36//!
37//! // Define a scraped item
38//! #[spider_macro::scraped_item]
39//! struct Article {
40//!     title: String,
41//!     content: String,
42//! }
43//! ```
44
45pub mod bloom_filter;
46pub mod error;
47pub mod item;
48pub mod request;
49pub mod response;
50pub mod utils;
51
52// Re-export serde and serde_json for use in macros
53pub use serde;
54pub use serde_json;