Crate robotstxt_with_cache[−][src]
Expand description
A native Rust port of Google’s robots.txt parser and matcher C++ library.
- Native Rust port, no third-part crate dependency
- Preserves all behaviour of original library
- 100% google original test passed
Quick start
use robotstxt::DefaultMatcher; let mut matcher = DefaultMatcher::default(); let robots_body = "user-agent: FooBot\n\ disallow: /\n"; assert_eq!(false, matcher.one_agent_allowed_by_robots(robots_body, "FooBot", "https://foo.com/"));
Modules
matcher | |
parser |
Traits
RobotsParseHandler | Handler for directives found in robots.txt. |
Functions
get_path_params_query | Extracts path (with params) and query part from URL. Removes scheme, authority, and fragment. Result always starts with “/”. Returns “/” if the url doesn’t have a path or is not valid. |
parse_robotstxt | Parses body of a robots.txt and emits parse callbacks. This will accept typical typos found in robots.txt, such as ‘disalow’. |
Type Definitions
DefaultCachingMatcher | |
DefaultMatcher |