Skip to main content

Crate vil_crawler

Crate vil_crawler 

Source
Expand description

§VIL Web Crawler (I01)

Async concurrent web crawler with BFS traversal, robots.txt support, domain filtering, and depth limiting.

§Quick Start

use vil_crawler::{Crawler, CrawlConfig};

let config = CrawlConfig::new()
    .max_pages(10)
    .max_depth(2)
    .concurrency(4);

let crawler = Crawler::new(config);
let results = crawler.crawl_site("https://example.com").await;
for r in &results {
    println!("{} — {} chars", r.url, r.text.len());
}

Re-exports§

pub use config::CrawlConfig;
pub use crawler::CrawlError;
pub use crawler::Crawler;
pub use plugin::CrawlerPlugin;
pub use result::CrawlResult;
pub use robots::RobotsChecker;
pub use semantic::CrawlEvent;
pub use semantic::CrawlFault;
pub use semantic::CrawlFaultType;
pub use semantic::CrawlerState;

Modules§

config
crawler
handlers
HTTP handlers for the crawler plugin — wired to real CrawlConfig state.
pipeline_sse
SSE pipeline builders for crawler operations.
plugin
VilPlugin implementation for web crawler integration.
result
robots
semantic
Semantic types for web crawling operations.