Expand description
Contains robots.txt parsers.
§Supported features and directives
- Removes BOM unicode
- Directive
User-Agent
- Directive
Allow
- Directive
Disallow
- Directive
Crawl-Delay
- Directive
Request-Rate
- Directive
Sitemap
- Directive
Clean-Param
§Example
use robotparser::parser::parse_robots_txt;
use robotparser::service::RobotsTxtService;
use url::Url;
let robots_txt_url = Url::parse("https://google.com/robots.txt").unwrap();
let robots_txt = "User-agent: *\nDisallow: /search";
let robots_txt = parse_robots_txt(robots_txt_url.origin(), robots_txt);
assert_eq!(robots_txt.get_warnings().len(), 0);
let robots_txt = robots_txt.get_result();
let good_url = Url::parse("https://google.com/test").unwrap();
let bad_url = Url::parse("https://google.com/search/vvv").unwrap();
assert_eq!(robots_txt.can_fetch("*", &bad_url), false);
assert_eq!(robots_txt.can_fetch("*", &good_url), true);
Structs§
- Parse
Result - The result of the robots.txt parser.
- Parse
Warning - Warning of robots.txt parser about problems when parsing robots.txt file.
Enums§
- Warning
Reason - Warning reason of robots.txt parser about problems when parsing robots.txt file.
Functions§
- parse_
fetched_ robots_ txt - Parses the text of the robots.txt file located in the specified place of origin, taking into account the response status code of the HTTP-request. IMPORTANT NOTE: origin must point to robots.txt url before redirects.
- parse_
robots_ txt - Parses the text of the robots.txt file located in the specified origin.