Trait RobotsTxtService

Source
pub trait RobotsTxtService {
    // Required methods
    fn can_fetch(&self, user_agent: &str, url: &Url) -> bool;
    fn get_crawl_delay(&self, user_agent: &str) -> Option<Duration>;
    fn normalize_url(&self, url: &mut Url) -> bool;
    fn normalize_url_ignore_origin(&self, url: &mut Url);
    fn get_sitemaps(&self) -> &[Url];
    fn get_req_rate(&self, user_agent: &str) -> Option<RequestRate>;
}
Expand description

Trait that implements robots txt service.

Required Methods§

Source

fn can_fetch(&self, user_agent: &str, url: &Url) -> bool

Using the parsed robots.txt decide if useragent can fetch url.

Source

fn get_crawl_delay(&self, user_agent: &str) -> Option<Duration>

Returns the crawl delay for this user agent as a Duration, or None if no crawl delay is defined.

Source

fn normalize_url(&self, url: &mut Url) -> bool

Removes the request parameters from the url that were listed in the Clean-param directive. This method CHECKS that the origin of the transmitted url matches the origin of robots.txt. Returns true if the operation was applied to the passed url. In other cases it returns false.

Source

fn normalize_url_ignore_origin(&self, url: &mut Url)

Removes the request parameters from the url that were listed in the Clean-param directive. This method DOES NOT CHECK that the origin of the transmitted url coincides with the origin of robots.txt.

Source

fn get_sitemaps(&self) -> &[Url]

Returns the list of URL sitemaps that have been listed in the robots.txt file.

Source

fn get_req_rate(&self, user_agent: &str) -> Option<RequestRate>

Returns information about the restrictions set for sending HTTP requests to the server.

Implementors§