RobotsMatchStrategy

Trait RobotsMatchStrategy 

Source
pub trait RobotsMatchStrategy: Default {
    // Required methods
    fn match_allow(&self, path: &str, pattern: &str) -> i32;
    fn match_disallow(&self, path: &str, pattern: &str) -> i32;

    // Provided method
    fn matches(path: &str, pattern: &str) -> bool { ... }
}
Expand description

Create a RobotsMatcher with the default matching strategy.

The default matching strategy is longest-match as opposed to the former internet draft that provisioned first-match strategy. Analysis shows that longest-match, while more restrictive for crawlers, is what webmasters assume when writing directives. For example, in case of conflicting matches (both Allow and Disallow), the longest match is the one the user wants. For example, in case of a robots.txt file that has the following rules

  Allow: /
  Disallow: /cgi-bin

it’s pretty obvious what the webmaster wants: they want to allow crawl of every URI except /cgi-bin. However, according to the expired internet standard, crawlers should be allowed to crawl everything with such a rule.

Required Methods§

Source

fn match_allow(&self, path: &str, pattern: &str) -> i32

Source

fn match_disallow(&self, path: &str, pattern: &str) -> i32

Provided Methods§

Source

fn matches(path: &str, pattern: &str) -> bool

Returns true if URI path matches the specified pattern. Pattern is anchored at the beginning of path. ‘$’ is special only at the end of pattern.

Since ‘path’ and ‘pattern’ are both externally determined (by the webmaster), we make sure to have acceptable worst-case performance.

use robotstxt::matcher::{LongestMatchRobotsMatchStrategy, RobotsMatchStrategy};

type Target = LongestMatchRobotsMatchStrategy;
assert_eq!(true, Target::matches("/", "/"));
assert_eq!(true, Target::matches("/abc", "/"));
assert_eq!(false, Target::matches("/", "/abc"));
assert_eq!(
    true,
    Target::matches("/google/robotstxt/tree/master", "/*/*/tree/master")
);
assert_eq!(
    true,
    Target::matches(
        "/google/robotstxt/tree/master/index.html",
        "/*/*/tree/master",
    )
);
assert_eq!(
    true,
    Target::matches("/google/robotstxt/tree/master", "/*/*/tree/master$")
);
assert_eq!(
    false,
    Target::matches("/google/robotstxt/tree/master/abc", "/*/*/tree/master$")
);
assert_eq!(
    false,
    Target::matches("/google/robotstxt/tree/abc", "/*/*/tree/master")
);

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§