Expand description

Contains robots.txt parsers.

Supported features and directives

  • Removes BOM unicode
  • Directive User-Agent
  • Directive Allow
  • Directive Disallow
  • Directive Crawl-Delay
  • Directive Request-Rate
  • Directive Sitemap
  • Directive Clean-Param

Example

use robotparser::parser::parse_robots_txt;
use robotparser::service::RobotsTxtService;
use url::Url;

let robots_txt_url = Url::parse("https://google.com/robots.txt").unwrap();
let robots_txt = "User-agent: *\nDisallow: /search";
let robots_txt = parse_robots_txt(robots_txt_url.origin(), robots_txt);
assert_eq!(robots_txt.get_warnings().len(), 0);
let robots_txt = robots_txt.get_result();
let good_url = Url::parse("https://google.com/test").unwrap();
let bad_url = Url::parse("https://google.com/search/vvv").unwrap();
assert_eq!(robots_txt.can_fetch("*", &bad_url), false);
assert_eq!(robots_txt.can_fetch("*", &good_url), true);

Structs

The result of the robots.txt parser.

Warning of robots.txt parser about problems when parsing robots.txt file.

Enums

Warning reason of robots.txt parser about problems when parsing robots.txt file.

Functions

Parses the text of the robots.txt file located in the specified place of origin, taking into account the response status code of the HTTP-request. IMPORTANT NOTE: origin must point to robots.txt url before redirects.

Parses the text of the robots.txt file located in the specified origin.