Expand description
Fast, zero-copy parsing and matching for robots.txt files.
fast-robots parses the standardized User-agent, Allow, and
Disallow records used by crawlers, then evaluates paths using the RFC 9309
matching rules: exact user-agent groups are preferred over *, the longest
matching rule wins, and Allow wins ties.
Parsed values borrow from the original input, so parsing avoids copying rule
strings, user agents, and extension metadata. Keep the input string or byte
buffer alive for as long as the returned RobotsTxt is used.
§Quick Start
use fast_robots::RobotsTxt;
let robots = RobotsTxt::parse(
"User-agent: *\n\
Disallow: /private/\n\
Allow: /private/public/\n",
);
assert!(!robots.is_allowed("ExampleBot", "/private/file.html"));
assert!(robots.is_allowed("ExampleBot", "/private/public/file.html"));§Fallible Byte Parsing
Use the byte APIs when reading directly from files or HTTP responses. They
reject invalid UTF-8 and inputs larger than DEFAULT_MAX_BYTES by default.
use fast_robots::RobotsTxt;
let robots = RobotsTxt::parse_bytes(b"User-agent: *\nDisallow: /tmp\n")?;
assert!(!robots.is_allowed("ExampleBot", "/tmp/cache"));§Diagnostics
The parser is tolerant by default and ignores malformed lines it can recover from. Use diagnostics when you want validator-style warnings alongside the parsed rules.
use fast_robots::{ParseWarningKind, RobotsTxt};
let report = RobotsTxt::parse_with_diagnostics(
"Disallow: /\nMissing separator\nUser-agent: *\nDisallow: /private\n",
);
assert!(matches!(
report.warnings[0].kind,
ParseWarningKind::RuleBeforeUserAgent { .. }
));
assert!(matches!(
report.warnings[1].kind,
ParseWarningKind::MissingSeparator { .. }
));
assert!(!report.robots.is_allowed("ExampleBot", "/private"));§Extension Metadata
With the default extensions feature, non-core directives such as Sitemap
and Crawl-delay are preserved as metadata. Extension metadata never changes
RobotsTxt::is_allowed decisions.
use fast_robots::RobotsTxt;
let robots = RobotsTxt::parse(
"Sitemap: https://example.com/sitemap.xml\n\
User-agent: SlowBot\n\
Crawl-delay: 5\n\
Disallow: /slow/\n",
);
assert_eq!(robots.extensions.sitemaps, ["https://example.com/sitemap.xml"]);
assert_eq!(robots.extensions.crawl_delays[0].agents, ["SlowBot"]);
assert!(!robots.is_allowed("SlowBot", "/slow/page.html"));Structs§
- Clean
Param extensions - A
Clean-paramdirective value. - Crawl
Delay extensions - A
Crawl-delaydirective and the group agents active when it appeared. - Directive
extensions - A non-core directive preserved as a raw key/value pair.
- Extensions
extensions - Feature-gated metadata for common non-standard directives.
- Group
- A
robots.txtgroup containing one or more user agents and their rules. - Parse
Options - Options shared by fallible parsing APIs.
- Parse
Report - Parsed rules plus any diagnostics collected during parsing.
- Parse
Warning - A recoverable parse issue with its one-based line number.
- Robots
Matcher - Precompiled matcher for repeated access checks against one
RobotsTxt. - Robots
Txt - Parsed
robots.txtdata. - Rule
- A single
AlloworDisallowrule.
Enums§
- Parse
Error - Errors returned by fallible parsing APIs.
- Parse
Warning Kind - Recoverable parse warning categories.
- Rule
Kind - Access-control directive kind.
Constants§
- DEFAULT_
MAX_ BYTES - Default maximum accepted input size for fallible parsing APIs.