Expand description
§Selector Cache Module
Provides a global cache for compiled CSS selectors to improve parsing performance.
§Overview
The selector cache module implements a global caching mechanism for compiled CSS selectors used in HTML parsing. Since selector compilation can be expensive, especially when the same selectors are used repeatedly during crawling, this module caches compiled selectors to avoid repeated compilation overhead. The cache uses a thread-safe approach to allow concurrent access from multiple crawler threads.
§Key Components
- SELECTOR_CACHE: Global static cache using Lazy initialization
- get_cached_selector: Main function to retrieve or compile selectors
- prewarm_cache: Function to pre-populate the cache with common selectors
- Thread Safety: Uses RwLock for concurrent read/write access
§Performance Benefits
The selector cache provides significant performance improvements when processing many pages with similar HTML structures. By caching compiled selectors, the system avoids the computational cost of parsing the same CSS selector expressions repeatedly. The cache uses a read-write lock to allow multiple concurrent readers while ensuring thread safety during cache updates.
§Example
use spider_util::selector_cache::get_cached_selector;
// Get a cached selector (compiles and caches if not already present)
if let Some(selector) = get_cached_selector("div.content > p") {
// Use the selector for parsing HTML
// The selector is now cached for future use
}
// Pre-warm the cache with commonly used selectors
spider_util::selector_cache::prewarm_cache();Functions§
- get_
cached_ selector - Get a compiled selector from the cache or compile and store it if not present
- prewarm_
cache - Pre-warm the selector cache with commonly used selectors