Crate halldyll_robots

Crate halldyll_robots 

Source
Expand description

§halldyll-robots

RFC 9309 compliant robots.txt parser and checker.

§Features

  • RFC 9309 Compliance: Full support for the robots.txt standard
  • Unavailable vs Unreachable: Proper handling per RFC (4xx = allow, 5xx = deny)
  • Safe Mode: Optional stricter handling of 401/403 as deny
  • Conditional GET: ETag/Last-Modified support for bandwidth savings
  • Request-rate: Non-standard but common directive support
  • Caching: In-memory cache with optional file persistence
  • Pattern Matching: Wildcards (*), end anchors ($), percent-encoding
  • UTF-8 BOM: Automatic stripping of BOM prefix
  • Observability: Detailed logging and statistics with min/max/avg metrics

§Example

use halldyll_robots::{RobotsChecker, RobotsConfig};
use url::Url;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = RobotsConfig::default();
    let checker = RobotsChecker::new(config);
     
    let url = Url::parse("https://example.com/some/path")?;
    let decision = checker.is_allowed(&url).await;
     
    if decision.allowed {
        println!("URL is allowed");
    } else {
        println!("URL is blocked: {:?}", decision.reason);
    }
     
    Ok(())
}

Re-exports§

pub use cache::CacheStats;
pub use cache::CacheStatsSnapshot;
pub use cache::RobotsCache;
pub use checker::RobotsChecker;
pub use checker::RobotsDiagnostics;
pub use fetcher::FetchStats;
pub use fetcher::FetchStatsSnapshot;
pub use fetcher::RobotsFetcher;
pub use matcher::RobotsMatcher;
pub use parser::RobotsParser;
pub use types::Decision;
pub use types::DecisionReason;
pub use types::EffectiveRules;
pub use types::FetchStatus;
pub use types::Group;
pub use types::RequestRate;
pub use types::RobotsCacheKey;
pub use types::RobotsConfig;
pub use types::RobotsPolicy;
pub use types::Rule;
pub use types::RuleKind;

Modules§

cache
Cache - Robots.txt caching with TTL and optional persistence
checker
Main robots.txt checker with caching and fetching.
fetcher
Fetcher - RFC 9309 compliant robots.txt fetching
matcher
Matcher - RFC 9309 compliant path matching
parser
Parser - RFC 9309 compliant robots.txt parser
types
Types - Core types for robots.txt handling (RFC 9309)