Crate robotstxt[][src]

A native Rust port of Google's robots.txt parser and matcher C++ library.

  • Native Rust port, no third-part crate dependency
  • Preserves all behaviour of original library
  • 100% google original test passed

Quick start

use robotstxt::DefaultMatcher;

let mut matcher = DefaultMatcher::default();
let robots_body = "user-agent: FooBot\n\
                   disallow: /\n";
assert_eq!(false, matcher.one_agent_allowed_by_robots(robots_body, "FooBot", "https://foo.com/"));

Modules

matcher

A matcher module.

parser

A parser module.

Traits

RobotsParseHandler

Handler for directives found in robots.txt.

Functions

get_path_params_query

Extracts path (with params) and query part from URL. Removes scheme, authority, and fragment. Result always starts with "/". Returns "/" if the url doesn't have a path or is not valid.

parse_robotstxt

Parses body of a robots.txt and emits parse callbacks. This will accept typical typos found in robots.txt, such as 'disalow'.

Type Definitions

DefaultMatcher

A default RobotsMatcher with LongestMatchRobotsMatchStrategy.