Struct Robot

Source

pub struct Robot {
    pub delay: Option<f32>,
    pub sitemaps: Vec<String>,
    /* private fields */
}

Fields§

§delay: Option<f32>

The delay in seconds between requests. If Crawl-Delay is set in robots.txt it will return Some(f32) and otherwise None.

§sitemaps: Vec<String>

Any sitemaps found in the robots.txt file are added to this vector. According to the robots.txt specification a sitemap found in robots.txt is accessible and available to any bot reading robots.txt.

Implementations§

Source §

impl Robot

Source

pub fn new(agent: &str, txt: &[u8]) -> Result<Self, Error>

Construct a new Robot object specifically processed for the given user agent. The user agent extracts all relevant rules from robots.txt and stores them internally. If the user agent isn’t found in robots.txt we default to *.

Note: The agent string is lowercased before comparison, as required by the robots.txt specification.

§Errors

If there are difficulties parsing, which should be rare as the parser is quite forgiving, then an InvalidRobots error is returned.

Source

pub fn allowed(&self, url: &str) -> bool

Check if the given URL is allowed for the agent by robots.txt. This function returns true or false according to the rules in robots.txt.

The provided URL can be absolute or relative depending on user preference.

§Example

use texting_robots::Robot;

let r = Robot::new("Ferris", b"Disallow: /secret").unwrap();
assert_eq!(r.allowed("https://example.com/secret"), false);
assert_eq!(r.allowed("/secret"), false);
assert_eq!(r.allowed("/everything-else"), true);