robotxt
Also check out other xwde projects here.
The implementation of the robots.txt (or URL exclusion) protocol in the Rust
programming language with the support of crawl-delay, sitemap and universal
* match extensions (according to the RFC specification).
Features
builderto enablerobotxt::{RobotsBuilder, GroupBuilder}. Enabled by default.parserto enablerobotxt::{Robots}. Enabled by default.
Examples
- parse the most specific
user-agentin the providedrobots.txtfile:
use Robots;
- build the new
robots.txtfile in a declarative manner:
use RobotsBuilder;
Links
- Request for Comments: 9309 on RFC-Editor.com
- Introduction to Robots.txt on Google.com
- How Google interprets Robots.txt on Google.com
- What is Robots.txt file on Moz.com
Notes
- The parser is based on Smerity/texting_robots
- The
Hostdirective is not supported