Struct RobotFileParser

Source
pub struct RobotFileParser {
    pub disallow_all: bool,
    pub allow_all: bool,
    pub last_checked: i64,
    /* private fields */
}
Expand description

robots.txt file parser

Fields§

§disallow_all: bool

Dis-allow links reguardless of robots.txt

§allow_all: bool

Allow links reguardless of robots.txt

§last_checked: i64

Time last checked robots.txt file

Implementations§

Source§

impl RobotFileParser

Source

pub fn new() -> Box<RobotFileParser>

Establish a new robotparser for a website domain

Source

pub fn mtime(&self) -> i64

Returns the time the robots.txt file was last fetched.

This is useful for long-running web spiders that need to check for new robots.txt files periodically.

Source

pub fn modified(&mut self)

Sets the time the robots.txt file was last fetched to the current time.

Source

pub fn get_entries(&self) -> &Vec<Entry>

Get the entries inserted.

Source

pub fn get_base_entry(&self) -> &Entry

Get the base entry inserted.

Source

pub async fn read(&mut self, client: &Client, url: &str)

Reads the robots.txt URL and feeds it to the parser.

Source

pub async fn from_response(&mut self, response: Response)

Reads the HTTP response and feeds it to the parser.

Source

pub fn parse<T: AsRef<str>>(&mut self, lines: &[T])

Parse the input lines from a robots.txt file

We allow that a user-agent: line is not preceded by one or more blank lines.

Source

pub fn set_disallow_list(&mut self, _path: &str)

Include the disallow paths in the regex set. This does nothing without the ‘regex’ feature.

Source

pub fn set_disallow_agents_list(&mut self, _agent: &str)

Include the disallow agents in the regex set. This does nothing without the ‘regex’ feature.

Source

pub fn build_disallow_list(&mut self)

Build the regex disallow list. This does nothing without the ‘regex’ feature.

Source

pub fn can_fetch<T: AsRef<str>>(&self, useragent: T, url: &str) -> bool

Using the parsed robots.txt decide if useragent can fetch url

Source

pub fn entry_allowed<T: AsRef<str>>(&self, useragent: &T, url_str: &str) -> bool

Is the entry apply to the robots.txt?

Source

pub fn get_crawl_delay( &self, useragent: &Option<Box<CompactString>>, ) -> Option<Duration>

Returns the crawl delay for this user agent as a Duration, or None if no crawl delay is defined.

Source

pub fn get_req_rate<T: AsRef<str>>(&self, useragent: T) -> Option<RequestRate>

Returns the request rate for this user agent as a RequestRate, or None if not request rate is defined

Trait Implementations§

Source§

impl Clone for RobotFileParser

Source§

fn clone(&self) -> RobotFileParser

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for RobotFileParser

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for RobotFileParser

Source§

fn eq(&self, other: &RobotFileParser) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Eq for RobotFileParser

Source§

impl StructuralPartialEq for RobotFileParser

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> MaybeSendSync for T