Module parser

Expand description

Contains robots.txt parsers.

§Supported features and directives

Removes BOM unicode
Directive User-Agent
Directive Allow
Directive Disallow
Directive Crawl-Delay
Directive Request-Rate
Directive Sitemap
Directive Clean-Param

§Example

use robotparser::parser::parse_robots_txt;
use robotparser::service::RobotsTxtService;
use url::Url;

let robots_txt_url = Url::parse("https://google.com/robots.txt").unwrap();
let robots_txt = "User-agent: *\nDisallow: /search";
let robots_txt = parse_robots_txt(robots_txt_url.origin(), robots_txt);
assert_eq!(robots_txt.get_warnings().len(), 0);
let robots_txt = robots_txt.get_result();
let good_url = Url::parse("https://google.com/test").unwrap();
let bad_url = Url::parse("https://google.com/search/vvv").unwrap();
assert_eq!(robots_txt.can_fetch("*", &bad_url), false);
assert_eq!(robots_txt.can_fetch("*", &good_url), true);

Structs§

ParseResult: The result of the robots.txt parser.
ParseWarning: Warning of robots.txt parser about problems when parsing robots.txt file.

Enums§

WarningReason: Warning reason of robots.txt parser about problems when parsing robots.txt file.

Functions§

parse_fetched_robots_txt: Parses the text of the robots.txt file located in the specified place of origin, taking into account the response status code of the HTTP-request. IMPORTANT NOTE: origin must point to robots.txt url before redirects.
parse_robots_txt: Parses the text of the robots.txt file located in the specified origin.

Module parserCopy item path

§Supported features and directives

§Example

Structs§

Enums§

Functions§

Module parser