Trait robotstxt_with_cache::matcher::RobotsMatchStrategy[][src]

pub trait RobotsMatchStrategy: Default {
    fn match_allow(&self, path: &str, pattern: &str) -> i32;
fn match_disallow(&self, path: &str, pattern: &str) -> i32; fn matches(path: &str, pattern: &str) -> bool { ... } }
Expand description

Create a RobotsMatcher with the default matching strategy.

The default matching strategy is longest-match as opposed to the former internet draft that provisioned first-match strategy. Analysis shows that longest-match, while more restrictive for crawlers, is what webmasters assume when writing directives. For example, in case of conflicting matches (both Allow and Disallow), the longest match is the one the user wants. For example, in case of a robots.txt file that has the following rules

  Allow: /
  Disallow: /cgi-bin

it’s pretty obvious what the webmaster wants: they want to allow crawl of every URI except /cgi-bin. However, according to the expired internet standard, crawlers should be allowed to crawl everything with such a rule.

Required methods

Provided methods

Returns true if URI path matches the specified pattern. Pattern is anchored at the beginning of path. ‘$’ is special only at the end of pattern.

Since ‘path’ and ‘pattern’ are both externally determined (by the webmaster), we make sure to have acceptable worst-case performance.

use robotstxt::matcher::{LongestMatchRobotsMatchStrategy, RobotsMatchStrategy};

type Target = LongestMatchRobotsMatchStrategy;
assert_eq!(true, Target::matches("/", "/"));
assert_eq!(true, Target::matches("/abc", "/"));
assert_eq!(false, Target::matches("/", "/abc"));
assert_eq!(
    true,
    Target::matches("/google/robotstxt/tree/master", "/*/*/tree/master")
);
assert_eq!(
    true,
    Target::matches(
        "/google/robotstxt/tree/master/index.html",
        "/*/*/tree/master",
    )
);
assert_eq!(
    true,
    Target::matches("/google/robotstxt/tree/master", "/*/*/tree/master$")
);
assert_eq!(
    false,
    Target::matches("/google/robotstxt/tree/master/abc", "/*/*/tree/master$")
);
assert_eq!(
    false,
    Target::matches("/google/robotstxt/tree/abc", "/*/*/tree/master")
);

Implementors