Struct texting_robots::Robot
source · pub struct Robot {
pub delay: Option<f32>,
pub sitemaps: Vec<String>,
/* private fields */
}
Fields§
§delay: Option<f32>
The delay in seconds between requests.
If Crawl-Delay
is set in robots.txt
it will return Some(f32)
and otherwise None
.
sitemaps: Vec<String>
Any sitemaps found in the robots.txt
file are added to this vector.
According to the robots.txt
specification a sitemap found in robots.txt
is accessible and available to any bot reading robots.txt
.
Implementations§
source§impl Robot
impl Robot
sourcepub fn new(agent: &str, txt: &[u8]) -> Result<Self, Error>
pub fn new(agent: &str, txt: &[u8]) -> Result<Self, Error>
Construct a new Robot object specifically processed for the given user agent.
The user agent extracts all relevant rules from robots.txt
and stores them
internally. If the user agent isn’t found in robots.txt
we default to *
.
Note: The agent string is lowercased before comparison, as required by the
robots.txt
specification.
Errors
If there are difficulties parsing, which should be rare as the parser is quite forgiving, then an InvalidRobots error is returned.
sourcepub fn allowed(&self, url: &str) -> bool
pub fn allowed(&self, url: &str) -> bool
Check if the given URL is allowed for the agent by robots.txt
.
This function returns true or false according to the rules in robots.txt
.
The provided URL can be absolute or relative depending on user preference.
Example
use texting_robots::Robot;
let r = Robot::new("Ferris", b"Disallow: /secret").unwrap();
assert_eq!(r.allowed("https://example.com/secret"), false);
assert_eq!(r.allowed("/secret"), false);
assert_eq!(r.allowed("/everything-else"), true);