Skip to main content

RobotsTxt

Struct RobotsTxt 

Source
pub struct RobotsTxt<'a> {
    pub groups: Vec<Group<'a>>,
    pub extensions: Extensions<'a>,
}
Expand description

Parsed robots.txt data.

Values inside this type borrow from the original input. Use RobotsTxt::is_allowed for access checks and inspect RobotsTxt::groups when you need the parsed rule structure.

§Examples

use fast_robots::{RobotsTxt, RuleKind};

let robots = RobotsTxt::parse("User-agent: *\nDisallow: /admin\n");

assert_eq!(robots.groups[0].agents, ["*"]);
assert_eq!(robots.groups[0].rules[0].kind, RuleKind::Disallow);
assert_eq!(robots.groups[0].rules[0].pattern, "/admin");

Fields§

§groups: Vec<Group<'a>>

Standard access-control groups in source order.

§extensions: Extensions<'a>
Available on crate feature extensions only.

Non-core metadata collected when the extensions feature is enabled.

Implementations§

Source§

impl<'a> RobotsTxt<'a>

Source

pub fn parse(input: &'a str) -> Self

Parses a UTF-8 robots.txt string into access rules.

This is tolerant and infallible: malformed lines are ignored where the parser can recover. Use RobotsTxt::parse_with_diagnostics to collect warnings, or RobotsTxt::parse_with_options to enforce a size limit.

§Examples
use fast_robots::RobotsTxt;

let robots = RobotsTxt::parse("User-agent: *\nDisallow: /private\n");

assert!(!robots.is_allowed("ExampleBot", "/private/file.html"));
assert!(robots.is_allowed("ExampleBot", "/public/file.html"));
Source

pub fn parse_bytes(input: &'a [u8]) -> Result<Self, ParseError>

Parses UTF-8 bytes into access rules using ParseOptions::default.

Returns ParseError::Utf8 for invalid UTF-8 and ParseError::TooLarge when the input is larger than DEFAULT_MAX_BYTES.

§Examples
use fast_robots::RobotsTxt;

let robots = RobotsTxt::parse_bytes(b"User-agent: *\nDisallow: /tmp\n")?;
assert!(!robots.is_allowed("ExampleBot", "/tmp/cache"));
Source

pub fn parse_bytes_with_options( input: &'a [u8], options: ParseOptions, ) -> Result<Self, ParseError>

Parses UTF-8 bytes into access rules with explicit options.

Use this when reading raw bytes and you need a custom size limit.

§Examples
use fast_robots::{ParseOptions, RobotsTxt};

let robots = RobotsTxt::parse_bytes_with_options(
    b"User-agent: *\nDisallow: /cache\n",
    ParseOptions { max_bytes: Some(1024) },
)?;

assert!(!robots.is_allowed("ExampleBot", "/cache/file"));
Source

pub fn parse_with_options( input: &'a str, options: ParseOptions, ) -> Result<Self, ParseError>

Parses a UTF-8 string into access rules with explicit options.

This is useful when the input is already a str but should still be checked against a maximum size.

§Examples
use fast_robots::{ParseOptions, RobotsTxt};

let robots = RobotsTxt::parse_with_options(
    "User-agent: *\nDisallow: /private\n",
    ParseOptions { max_bytes: Some(1024) },
)?;

assert!(!robots.is_allowed("ExampleBot", "/private"));
Source

pub fn parse_with_diagnostics(input: &'a str) -> ParseReport<'a>

Parses a UTF-8 string and records recoverable syntax warnings.

Diagnostics do not change parser recovery behavior; they only expose the issues that tolerant parsing skipped.

§Examples
use fast_robots::{ParseWarningKind, RobotsTxt};

let report = RobotsTxt::parse_with_diagnostics(
    "Disallow: /\nMissing separator\nUser-agent: *\nDisallow: /private\n",
);

assert_eq!(report.warnings.len(), 2);
assert!(matches!(
    report.warnings[0].kind,
    ParseWarningKind::RuleBeforeUserAgent { .. }
));
assert!(!report.robots.is_allowed("ExampleBot", "/private"));
Source

pub fn parse_with_diagnostics_options( input: &'a str, options: ParseOptions, ) -> Result<ParseReport<'a>, ParseError>

Parses a UTF-8 string with diagnostics and explicit options.

§Examples
use fast_robots::{ParseOptions, RobotsTxt};

let report = RobotsTxt::parse_with_diagnostics_options(
    "User-agent: *\nDisallow: /private\n",
    ParseOptions { max_bytes: Some(1024) },
)?;

assert!(report.warnings.is_empty());
assert!(!report.robots.is_allowed("ExampleBot", "/private"));
Source

pub fn parse_bytes_with_diagnostics( input: &'a [u8], ) -> Result<ParseReport<'a>, ParseError>

Parses UTF-8 bytes and records recoverable syntax warnings.

Uses ParseOptions::default for size checking.

§Examples
use fast_robots::RobotsTxt;

let report = RobotsTxt::parse_bytes_with_diagnostics(
    b"User-agent: *\nDisallow: /private\n",
)?;

assert!(report.warnings.is_empty());
assert!(!report.robots.is_allowed("ExampleBot", "/private"));
Source

pub fn parse_bytes_with_diagnostics_options( input: &'a [u8], options: ParseOptions, ) -> Result<ParseReport<'a>, ParseError>

Parses UTF-8 bytes with diagnostics and explicit options.

§Examples
use fast_robots::{ParseOptions, RobotsTxt};

let report = RobotsTxt::parse_bytes_with_diagnostics_options(
    b"User-agent: *\nDisallow: /private\n",
    ParseOptions { max_bytes: Some(1024) },
)?;

assert!(report.warnings.is_empty());
Source

pub fn matcher(&'a self) -> RobotsMatcher<'a>

Builds an indexed matcher for repeated access checks.

The returned matcher borrows this parsed file, indexes user-agent groups, and precomputes rule metadata. Use it when checking many URLs against the same robots.txt; for one-off checks, RobotsTxt::is_allowed avoids the upfront allocation cost.

§Examples
use fast_robots::RobotsTxt;

let robots = RobotsTxt::parse("User-agent: *\nDisallow: /private\n");
let matcher = robots.matcher();

assert!(!matcher.is_allowed("ExampleBot", "/private/file"));
assert!(matcher.is_allowed("ExampleBot", "/public/file"));
Source

pub fn is_allowed(&self, user_agent: &str, path: &str) -> bool

Returns whether user_agent may crawl path.

The matcher implements the core RFC 9309 access semantics used by this crate: exact user-agent groups are considered before the * fallback, matching exact groups are merged, the longest matching pattern wins, and Allow wins ties. /robots.txt is always allowed.

path should be the URL path and optional query string, not a full URL.

§Examples
use fast_robots::RobotsTxt;

let robots = RobotsTxt::parse(
    "User-agent: *\n\
     Disallow: /private\n\
     Allow: /private/public\n",
);

assert!(!robots.is_allowed("ExampleBot", "/private/file"));
assert!(robots.is_allowed("ExampleBot", "/private/public/file"));
assert!(robots.is_allowed("ExampleBot", "/robots.txt"));

Trait Implementations§

Source§

impl<'a> Clone for RobotsTxt<'a>

Source§

fn clone(&self) -> RobotsTxt<'a>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<'a> Debug for RobotsTxt<'a>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'a> PartialEq for RobotsTxt<'a>

Source§

fn eq(&self, other: &RobotsTxt<'a>) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl<'a> Eq for RobotsTxt<'a>

Source§

impl<'a> StructuralPartialEq for RobotsTxt<'a>

Auto Trait Implementations§

§

impl<'a> Freeze for RobotsTxt<'a>

§

impl<'a> RefUnwindSafe for RobotsTxt<'a>

§

impl<'a> Send for RobotsTxt<'a>

§

impl<'a> Sync for RobotsTxt<'a>

§

impl<'a> Unpin for RobotsTxt<'a>

§

impl<'a> UnsafeUnpin for RobotsTxt<'a>

§

impl<'a> UnwindSafe for RobotsTxt<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.