pub struct CrawlerConfig {Show 14 fields
pub max_concurrent_downloads: usize,
pub max_pending_requests: usize,
pub parser_workers: usize,
pub max_concurrent_pipelines: usize,
pub channel_capacity: usize,
pub output_batch_size: usize,
pub response_backpressure_threshold: usize,
pub item_backpressure_threshold: usize,
pub retry_release_permit: bool,
pub live_stats: bool,
pub live_stats_interval: Duration,
pub live_stats_preview_fields: Option<Vec<String>>,
pub shutdown_grace_period: Duration,
pub item_limit: Option<usize>,
}Expand description
Core runtime configuration for the crawler.
Fields§
§max_concurrent_downloads: usizeThe maximum number of concurrent downloads.
max_pending_requests: usizeThe maximum number of outstanding requests tracked by the scheduler.
parser_workers: usizeThe number of workers dedicated to parsing responses.
max_concurrent_pipelines: usizeThe maximum number of concurrent item processing pipelines.
channel_capacity: usizeThe capacity of communication channels between components.
output_batch_size: usizeNumber of requests/items processed per parser output batch.
response_backpressure_threshold: usizeDownloader backpressure threshold for the response channel.
item_backpressure_threshold: usizeParser backpressure threshold for the item channel.
retry_release_permit: boolWhen enabled, retries are scheduled outside the downloader permit path.
live_stats: boolEnables in-place live statistics updates on terminal stdout.
live_stats_interval: DurationRefresh interval for live statistics output.
live_stats_preview_fields: Option<Vec<String>>Optional item fields to show in live-stats preview instead of full JSON.
shutdown_grace_period: DurationMaximum time to wait for a graceful shutdown before forcing task abort.
item_limit: Option<usize>Maximum number of scraped items to process before stopping the crawl.
Implementations§
Source§impl CrawlerConfig
impl CrawlerConfig
Sourcepub fn with_max_concurrent_downloads(self, limit: usize) -> Self
pub fn with_max_concurrent_downloads(self, limit: usize) -> Self
Sets the maximum number of concurrent downloads.
Sourcepub fn with_max_pending_requests(self, limit: usize) -> Self
pub fn with_max_pending_requests(self, limit: usize) -> Self
Sets the maximum number of outstanding requests tracked by the scheduler.
Sourcepub fn with_parser_workers(self, count: usize) -> Self
pub fn with_parser_workers(self, count: usize) -> Self
Sets the number of parser workers.
Sourcepub fn with_max_concurrent_pipelines(self, limit: usize) -> Self
pub fn with_max_concurrent_pipelines(self, limit: usize) -> Self
Sets the maximum number of concurrent pipelines.
Sourcepub fn with_channel_capacity(self, capacity: usize) -> Self
pub fn with_channel_capacity(self, capacity: usize) -> Self
Sets the channel capacity.
Sourcepub fn with_output_batch_size(self, batch_size: usize) -> Self
pub fn with_output_batch_size(self, batch_size: usize) -> Self
Sets the parser output batch size.
Sourcepub fn with_response_backpressure_threshold(self, threshold: usize) -> Self
pub fn with_response_backpressure_threshold(self, threshold: usize) -> Self
Sets the downloader response-channel backpressure threshold.
Sourcepub fn with_item_backpressure_threshold(self, threshold: usize) -> Self
pub fn with_item_backpressure_threshold(self, threshold: usize) -> Self
Sets the parser item-channel backpressure threshold.
Sourcepub fn with_retry_release_permit(self, enabled: bool) -> Self
pub fn with_retry_release_permit(self, enabled: bool) -> Self
Controls whether retry delays release the downloader permit immediately.
Sourcepub fn with_live_stats(self, enabled: bool) -> Self
pub fn with_live_stats(self, enabled: bool) -> Self
Enables or disables in-place live stats updates on stdout.
Sourcepub fn with_live_stats_interval(self, interval: Duration) -> Self
pub fn with_live_stats_interval(self, interval: Duration) -> Self
Sets the refresh interval used by live stats mode.
Sourcepub fn with_live_stats_preview_fields(
self,
fields: impl IntoIterator<Item = impl Into<String>>,
) -> Self
pub fn with_live_stats_preview_fields( self, fields: impl IntoIterator<Item = impl Into<String>>, ) -> Self
Sets which item fields should be shown in live stats preview output.
Field names support dot notation for nested JSON objects, for example:
title, source_url, or metadata.Japanese.
You can also set aliases with label=path, for example:
url=source_url or jp=metadata.Japanese.
Sourcepub fn with_shutdown_grace_period(self, grace_period: Duration) -> Self
pub fn with_shutdown_grace_period(self, grace_period: Duration) -> Self
Sets the maximum grace period for crawler shutdown.
Sourcepub fn with_item_limit(self, limit: usize) -> Self
pub fn with_item_limit(self, limit: usize) -> Self
Sets the maximum number of scraped items to process before stopping the crawl.
Trait Implementations§
Source§impl Clone for CrawlerConfig
impl Clone for CrawlerConfig
Source§fn clone(&self) -> CrawlerConfig
fn clone(&self) -> CrawlerConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more