pub trait RobotsTxtService {
// Required methods
fn can_fetch(&self, user_agent: &str, url: &Url) -> bool;
fn get_crawl_delay(&self, user_agent: &str) -> Option<Duration>;
fn normalize_url(&self, url: &mut Url) -> bool;
fn normalize_url_ignore_origin(&self, url: &mut Url);
fn get_sitemaps(&self) -> &[Url];
fn get_req_rate(&self, user_agent: &str) -> Option<RequestRate>;
}
Expand description
Trait that implements robots txt service.
Required Methods§
Sourcefn can_fetch(&self, user_agent: &str, url: &Url) -> bool
fn can_fetch(&self, user_agent: &str, url: &Url) -> bool
Using the parsed robots.txt decide if useragent can fetch url.
Sourcefn get_crawl_delay(&self, user_agent: &str) -> Option<Duration>
fn get_crawl_delay(&self, user_agent: &str) -> Option<Duration>
Returns the crawl delay for this user agent as a Duration, or None if no crawl delay is defined.
Sourcefn normalize_url(&self, url: &mut Url) -> bool
fn normalize_url(&self, url: &mut Url) -> bool
Removes the request parameters from the url that were listed in the Clean-param
directive.
This method CHECKS that the origin of the transmitted url matches the origin of robots.txt.
Returns true if the operation was applied to the passed url.
In other cases it returns false.
Sourcefn normalize_url_ignore_origin(&self, url: &mut Url)
fn normalize_url_ignore_origin(&self, url: &mut Url)
Removes the request parameters from the url that were listed in the Clean-param
directive.
This method DOES NOT CHECK that the origin of the transmitted url coincides with the origin of robots.txt.
Sourcefn get_sitemaps(&self) -> &[Url]
fn get_sitemaps(&self) -> &[Url]
Returns the list of URL sitemaps that have been listed in the robots.txt file.
Sourcefn get_req_rate(&self, user_agent: &str) -> Option<RequestRate>
fn get_req_rate(&self, user_agent: &str) -> Option<RequestRate>
Returns information about the restrictions set for sending HTTP requests to the server.