pub struct RobotFileParser {
pub disallow_all: bool,
pub allow_all: bool,
pub last_checked: i64,
/* private fields */
}Expand description
robots.txt file parser
Fields§
§disallow_all: boolDis-allow links reguardless of robots.txt
allow_all: boolAllow links reguardless of robots.txt
last_checked: i64Time last checked robots.txt file
Implementations§
Source§impl RobotFileParser
impl RobotFileParser
Sourcepub fn new() -> Box<RobotFileParser>
pub fn new() -> Box<RobotFileParser>
Establish a new robotparser for a website domain
Sourcepub fn mtime(&self) -> i64
pub fn mtime(&self) -> i64
Returns the time the robots.txt file was last fetched.
This is useful for long-running web spiders that need to check for new robots.txt files periodically.
Sourcepub fn modified(&mut self)
pub fn modified(&mut self)
Sets the time the robots.txt file was last fetched to the current time.
Sourcepub fn get_entries(&self) -> &Vec<Entry>
pub fn get_entries(&self) -> &Vec<Entry>
Get the entries inserted.
Sourcepub fn get_base_entry(&self) -> &Entry
pub fn get_base_entry(&self) -> &Entry
Get the base entry inserted.
Sourcepub async fn read(&mut self, client: &Client, url: &str)
pub async fn read(&mut self, client: &Client, url: &str)
Reads the robots.txt URL and feeds it to the parser.
Sourcepub async fn from_response(&mut self, response: Response)
pub async fn from_response(&mut self, response: Response)
Reads the HTTP response and feeds it to the parser.
Sourcepub fn parse<T: AsRef<str>>(&mut self, lines: &[T])
pub fn parse<T: AsRef<str>>(&mut self, lines: &[T])
Parse the input lines from a robots.txt file
We allow that a user-agent: line is not preceded by one or more blank lines.
Sourcepub fn set_disallow_list(&mut self, _path: &str)
pub fn set_disallow_list(&mut self, _path: &str)
Include the disallow paths in the regex set. This does nothing without the ‘regex’ feature.
Sourcepub fn set_disallow_agents_list(&mut self, _agent: &str)
pub fn set_disallow_agents_list(&mut self, _agent: &str)
Include the disallow agents in the regex set. This does nothing without the ‘regex’ feature.
Sourcepub fn build_disallow_list(&mut self)
pub fn build_disallow_list(&mut self)
Build the regex disallow list. This does nothing without the ‘regex’ feature.
Sourcepub fn can_fetch<T: AsRef<str>>(&self, useragent: T, url: &str) -> bool
pub fn can_fetch<T: AsRef<str>>(&self, useragent: T, url: &str) -> bool
Using the parsed robots.txt decide if useragent can fetch url
Sourcepub fn entry_allowed<T: AsRef<str>>(&self, useragent: &T, url_str: &str) -> bool
pub fn entry_allowed<T: AsRef<str>>(&self, useragent: &T, url_str: &str) -> bool
Is the entry apply to the robots.txt?
Sourcepub fn get_crawl_delay(
&self,
useragent: &Option<Box<CompactString>>,
) -> Option<Duration>
pub fn get_crawl_delay( &self, useragent: &Option<Box<CompactString>>, ) -> Option<Duration>
Returns the crawl delay for this user agent as a Duration, or None if no crawl delay is defined.
Sourcepub fn get_req_rate<T: AsRef<str>>(&self, useragent: T) -> Option<RequestRate>
pub fn get_req_rate<T: AsRef<str>>(&self, useragent: T) -> Option<RequestRate>
Returns the request rate for this user agent as a RequestRate, or None if not request rate is defined
Trait Implementations§
Source§impl Clone for RobotFileParser
impl Clone for RobotFileParser
Source§fn clone(&self) -> RobotFileParser
fn clone(&self) -> RobotFileParser
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for RobotFileParser
impl Debug for RobotFileParser
Source§impl PartialEq for RobotFileParser
impl PartialEq for RobotFileParser
impl Eq for RobotFileParser
impl StructuralPartialEq for RobotFileParser
Auto Trait Implementations§
impl Freeze for RobotFileParser
impl RefUnwindSafe for RobotFileParser
impl Send for RobotFileParser
impl Sync for RobotFileParser
impl Unpin for RobotFileParser
impl UnwindSafe for RobotFileParser
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more