Expand description
§halldyll-robots
RFC 9309 compliant robots.txt parser and checker.
§Features
- RFC 9309 Compliance: Full support for the robots.txt standard
- Unavailable vs Unreachable: Proper handling per RFC (4xx = allow, 5xx = deny)
- Safe Mode: Optional stricter handling of 401/403 as deny
- Conditional GET: ETag/Last-Modified support for bandwidth savings
- Request-rate: Non-standard but common directive support
- Caching: In-memory cache with optional file persistence
- Pattern Matching: Wildcards (*), end anchors ($), percent-encoding
- UTF-8 BOM: Automatic stripping of BOM prefix
- Observability: Detailed logging and statistics with min/max/avg metrics
§Example
use halldyll_robots::{RobotsChecker, RobotsConfig};
use url::Url;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = RobotsConfig::default();
let checker = RobotsChecker::new(config);
let url = Url::parse("https://example.com/some/path")?;
let decision = checker.is_allowed(&url).await;
if decision.allowed {
println!("URL is allowed");
} else {
println!("URL is blocked: {:?}", decision.reason);
}
Ok(())
}Re-exports§
pub use cache::CacheStats;pub use cache::CacheStatsSnapshot;pub use cache::RobotsCache;pub use checker::RobotsChecker;pub use checker::RobotsDiagnostics;pub use fetcher::FetchStats;pub use fetcher::FetchStatsSnapshot;pub use fetcher::RobotsFetcher;pub use matcher::RobotsMatcher;pub use parser::RobotsParser;pub use types::Decision;pub use types::DecisionReason;pub use types::EffectiveRules;pub use types::FetchStatus;pub use types::Group;pub use types::RequestRate;pub use types::RobotsCacheKey;pub use types::RobotsConfig;pub use types::RobotsPolicy;pub use types::Rule;pub use types::RuleKind;
Modules§
- cache
- Cache - Robots.txt caching with TTL and optional persistence
- checker
- Main robots.txt checker with caching and fetching.
- fetcher
- Fetcher - RFC 9309 compliant robots.txt fetching
- matcher
- Matcher - RFC 9309 compliant path matching
- parser
- Parser - RFC 9309 compliant robots.txt parser
- types
- Types - Core types for robots.txt handling (RFC 9309)