Module robotparser::parser
source · [−]Expand description
Contains robots.txt parsers.
Supported features and directives
- Removes BOM unicode
- Directive
User-Agent
- Directive
Allow
- Directive
Disallow
- Directive
Crawl-Delay
- Directive
Request-Rate
- Directive
Sitemap
- Directive
Clean-Param
Example
use robotparser::parser::parse_robots_txt;
use robotparser::service::RobotsTxtService;
use url::Url;
let robots_txt_url = Url::parse("https://google.com/robots.txt").unwrap();
let robots_txt = "User-agent: *\nDisallow: /search";
let robots_txt = parse_robots_txt(robots_txt_url.origin(), robots_txt);
assert_eq!(robots_txt.get_warnings().len(), 0);
let robots_txt = robots_txt.get_result();
let good_url = Url::parse("https://google.com/test").unwrap();
let bad_url = Url::parse("https://google.com/search/vvv").unwrap();
assert_eq!(robots_txt.can_fetch("*", &bad_url), false);
assert_eq!(robots_txt.can_fetch("*", &good_url), true);
Structs
The result of the robots.txt parser.
Warning of robots.txt parser about problems when parsing robots.txt file.
Enums
Warning reason of robots.txt parser about problems when parsing robots.txt file.
Functions
Parses the text of the robots.txt file located in the specified place of origin, taking into account the response status code of the HTTP-request. IMPORTANT NOTE: origin must point to robots.txt url before redirects.
Parses the text of the robots.txt file located in the specified origin.