robotxt
Also check out other spire-rs projects
here.
The implementation of the robots.txt (or URL exclusion) protocol in the Rust
programming language with the support of crawl-delay, sitemap and universal
* match extensions (according to the RFC specification).
Features
parserto enablerobotxt::{Robots}. Enabled by default.builderto enablerobotxt::{RobotsBuilder, GroupBuilder}. Enabled by default.optimalto optimize overlapping and global rules, potentially improving matching speed at the cost of longer parsing times.serdeto enableserde::{Deserialize, Serialize}implementation, allowing the caching of related rules.
Examples
- parse the most specific
user-agentin the providedrobots.txtfile:
use Robots;
- build the new
robots.txtfile in a declarative manner:
use RobotsBuilder;
Links
- Request for Comments: 9309 on RFC-Editor.com
- Introduction to Robots.txt on Google.com
- How Google interprets Robots.txt on Google.com
- What is Robots.txt file on Moz.com
Notes
- The parser is based on Smerity/texting_robots.
- The
Hostdirective is not supported.