pub struct RobotsMatcher<S: RobotsMatchStrategy> { /* private fields */ }
Expand description
RobotsMatcher - matches robots.txt against URLs.
The Matcher uses a default match strategy for Allow/Disallow patterns which is the official way of Google crawler to match robots.txt. It is also possible to provide a custom match strategy.
The entry point for the user is to call one of the allowed_by_robots methods that return directly if a URL is being allowed according to the robots.txt and the crawl agent. The RobotsMatcher can be re-used for URLs/robots.txt but is not thread-safe.
Implementations§
Source§impl<'a, S: RobotsMatchStrategy> RobotsMatcher<S>
impl<'a, S: RobotsMatchStrategy> RobotsMatcher<S>
Sourcepub fn allowed_by_robots(
&mut self,
robots_body: &str,
user_agents: Vec<&str>,
url: &str,
) -> boolwhere
Self: RobotsParseHandler,
pub fn allowed_by_robots(
&mut self,
robots_body: &str,
user_agents: Vec<&str>,
url: &str,
) -> boolwhere
Self: RobotsParseHandler,
Returns true if ‘url’ is allowed to be fetched by any member of the “user_agents” vector. ‘url’ must be %-encoded according to RFC3986.
Sourcepub fn one_agent_allowed_by_robots(
&mut self,
robots_txt: &str,
user_agent: &str,
url: &str,
) -> boolwhere
Self: RobotsParseHandler,
pub fn one_agent_allowed_by_robots(
&mut self,
robots_txt: &str,
user_agent: &str,
url: &str,
) -> boolwhere
Self: RobotsParseHandler,
Do robots check for ‘url’ when there is only one user agent. ‘url’ must be %-encoded according to RFC3986.
Sourcepub fn is_valid_user_agent_to_obey(user_agent: &str) -> bool
pub fn is_valid_user_agent_to_obey(user_agent: &str) -> bool
Verifies that the given user agent is valid to be matched against robots.txt. Valid user agent strings only contain the characters [a-zA-Z_-].
Sourcepub fn matching_line(&self) -> u32
pub fn matching_line(&self) -> u32
Returns the line that matched or 0 if none matched.