Module parser

Source
Expand description

Contains robots.txt parsers.

§Supported features and directives

  • Removes BOM unicode
  • Directive User-Agent
  • Directive Allow
  • Directive Disallow
  • Directive Crawl-Delay
  • Directive Request-Rate
  • Directive Sitemap
  • Directive Clean-Param

§Example

use robotparser::parser::parse_robots_txt;
use robotparser::service::RobotsTxtService;
use url::Url;

let robots_txt_url = Url::parse("https://google.com/robots.txt").unwrap();
let robots_txt = "User-agent: *\nDisallow: /search";
let robots_txt = parse_robots_txt(robots_txt_url.origin(), robots_txt);
assert_eq!(robots_txt.get_warnings().len(), 0);
let robots_txt = robots_txt.get_result();
let good_url = Url::parse("https://google.com/test").unwrap();
let bad_url = Url::parse("https://google.com/search/vvv").unwrap();
assert_eq!(robots_txt.can_fetch("*", &bad_url), false);
assert_eq!(robots_txt.can_fetch("*", &good_url), true);

Structs§

ParseResult
The result of the robots.txt parser.
ParseWarning
Warning of robots.txt parser about problems when parsing robots.txt file.

Enums§

WarningReason
Warning reason of robots.txt parser about problems when parsing robots.txt file.

Functions§

parse_fetched_robots_txt
Parses the text of the robots.txt file located in the specified place of origin, taking into account the response status code of the HTTP-request. IMPORTANT NOTE: origin must point to robots.txt url before redirects.
parse_robots_txt
Parses the text of the robots.txt file located in the specified origin.