robots_txt
robots_txt is a lightweight robots.txt parser and generator for robots.txt written in Rust.
Nothing extra.
Unstable
The implementation is WIP.
Installation
Robots_txt is available on crates.io and can be included in your Cargo enabled project like this:
Cargo.toml:
[]
= "0.7"
Parsing & matching paths against rules
use Robots;
static ROBOTS: &'static str = r#"
# robots.txt for http://www.site.com
User-Agent: *
Disallow: /cyberworld/map/ # this is an infinite virtual URL space
# Cybermapper knows where to go
User-Agent: cybermapper
Disallow:
"#;
Building & rendering
main.rs:
extern crate robots_txt;
use Robots;
As a result we get
# robots.txt for http://cyber.example.com/
User-agent: cybermapper
Disallow:
User-agent: *
Disallow: /cyberworld/map/
# robots.txt for http://example.com/
User-agent: *
Disallow: /private
Disallow:
Crawl-delay: 4.5
Request-rate: 9/20
Sitemap: http://example.com/sitemap.xml
Host: example.com
Alternatives
- messense/robotparser-rs robots.txt parser for Rust
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT) at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.