robotxt-0.6.0 has been yanked.
robotxt
Also check out other spire-rs projects
here.
The implementation of the robots.txt (or URL exclusion) protocol in the Rust
programming language with the support of crawl-delay, sitemap and universal
* match extensions (according to the RFC specification).
Features
builderto enablerobotxt::{RobotsBuilder, GroupBuilder}. This feature is enabled by default.parserto enablerobotxt::{Robots}. This feature is enabled by default.optimalto enable overlapping rule eviction and global rule optimizations (this may result in longer parsing times but potentially faster matching).serdeto enable a customserde::{Deserialize, Serialize}implementation, allowing for the caching of related rules.
Examples
- parse the most specific
user-agentin the providedrobots.txtfile:
use Robots;
- build the new
robots.txtfile in a declarative manner:
use RobotsBuilder;
Links
- Request for Comments: 9309 on RFC-Editor.com
- Introduction to Robots.txt on Google.com
- How Google interprets Robots.txt on Google.com
- What is Robots.txt file on Moz.com
Notes
- The parser is based on Smerity/texting_robots.
- The
Hostdirective is not supported.