Struct robotstxt_with_cache::matcher::RobotsMatcher[][src]

pub struct RobotsMatcher<S: RobotsMatchStrategy> { /* fields omitted */ }
Expand description

RobotsMatcher - matches robots.txt against URLs.

The Matcher uses a default match strategy for Allow/Disallow patterns which is the official way of Google crawler to match robots.txt. It is also possible to provide a custom match strategy.

The entry point for the user is to call one of the allowed_by_robots methods that return directly if a URL is being allowed according to the robots.txt and the crawl agent. The RobotsMatcher can be re-used for URLs/robots.txt but is not thread-safe.

Implementations

Returns true if ‘url’ is allowed to be fetched by any member of the “user_agents” vector. ‘url’ must be %-encoded according to RFC3986.

Do robots check for ‘url’ when there is only one user agent. ‘url’ must be %-encoded according to RFC3986.

Verifies that the given user agent is valid to be matched against robots.txt. Valid user agent strings only contain the characters [a-zA-Z_-].

Returns the line that matched or 0 if none matched.

Trait Implementations

Returns the “default value” for a type. Read more

Any other unrecognized name/value pairs.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.