Skip to main content

Crate fast_robots

Crate fast_robots 

Source
Expand description

Fast, zero-copy parsing and matching for robots.txt files.

fast-robots parses the standardized User-agent, Allow, and Disallow records used by crawlers, then evaluates paths using the RFC 9309 matching rules: exact user-agent groups are preferred over *, the longest matching rule wins, and Allow wins ties.

Parsed values borrow from the original input, so parsing avoids copying rule strings, user agents, and extension metadata. Keep the input string or byte buffer alive for as long as the returned RobotsTxt is used.

§Quick Start

use fast_robots::RobotsTxt;

let robots = RobotsTxt::parse(
    "User-agent: *\n\
     Disallow: /private/\n\
     Allow: /private/public/\n",
);

assert!(!robots.is_allowed("ExampleBot", "/private/file.html"));
assert!(robots.is_allowed("ExampleBot", "/private/public/file.html"));

§Fallible Byte Parsing

Use the byte APIs when reading directly from files or HTTP responses. They reject invalid UTF-8 and inputs larger than DEFAULT_MAX_BYTES by default.

use fast_robots::RobotsTxt;

let robots = RobotsTxt::parse_bytes(b"User-agent: *\nDisallow: /tmp\n")?;
assert!(!robots.is_allowed("ExampleBot", "/tmp/cache"));

§Diagnostics

The parser is tolerant by default and ignores malformed lines it can recover from. Use diagnostics when you want validator-style warnings alongside the parsed rules.

use fast_robots::{ParseWarningKind, RobotsTxt};

let report = RobotsTxt::parse_with_diagnostics(
    "Disallow: /\nMissing separator\nUser-agent: *\nDisallow: /private\n",
);

assert!(matches!(
    report.warnings[0].kind,
    ParseWarningKind::RuleBeforeUserAgent { .. }
));
assert!(matches!(
    report.warnings[1].kind,
    ParseWarningKind::MissingSeparator { .. }
));
assert!(!report.robots.is_allowed("ExampleBot", "/private"));

§Extension Metadata

With the default extensions feature, non-core directives such as Sitemap and Crawl-delay are preserved as metadata. Extension metadata never changes RobotsTxt::is_allowed decisions.

use fast_robots::RobotsTxt;

let robots = RobotsTxt::parse(
    "Sitemap: https://example.com/sitemap.xml\n\
     User-agent: SlowBot\n\
     Crawl-delay: 5\n\
     Disallow: /slow/\n",
);

assert_eq!(robots.extensions.sitemaps, ["https://example.com/sitemap.xml"]);
assert_eq!(robots.extensions.crawl_delays[0].agents, ["SlowBot"]);
assert!(!robots.is_allowed("SlowBot", "/slow/page.html"));

Structs§

CleanParamextensions
A Clean-param directive value.
CrawlDelayextensions
A Crawl-delay directive and the group agents active when it appeared.
Directiveextensions
A non-core directive preserved as a raw key/value pair.
Extensionsextensions
Feature-gated metadata for common non-standard directives.
Group
A robots.txt group containing one or more user agents and their rules.
ParseOptions
Options shared by fallible parsing APIs.
ParseReport
Parsed rules plus any diagnostics collected during parsing.
ParseWarning
A recoverable parse issue with its one-based line number.
RobotsMatcher
Precompiled matcher for repeated access checks against one RobotsTxt.
RobotsTxt
Parsed robots.txt data.
Rule
A single Allow or Disallow rule.

Enums§

ParseError
Errors returned by fallible parsing APIs.
ParseWarningKind
Recoverable parse warning categories.
RuleKind
Access-control directive kind.

Constants§

DEFAULT_MAX_BYTES
Default maximum accepted input size for fallible parsing APIs.