sitemap-rs
A Rust library to generate URL, Index, Image, Video, and News sitemaps.
Features
Generates sitemaps
Validates sitemap data
There are a bunch of restrictions as to what data your sitemaps can hold. This library surfaces these validation issues at struct instantiation time. Now you don't have to wait for Google Search Console or Bing Webmaster Tools to alert you of sitemap issues before you can fix data problems.
Validations
- URL Sitemap
LocationTooLong- A
<loc>must be less than2,048characters.
- A
TooManyUrls- Can only contain as many as
50,000<url>.
- Can only contain as many as
TooMuchNews- Can only contain as many as
1,000<news: news>.
- Can only contain as many as
PriorityTooLowandPriorityTooHigh- A
<priority>must be between0.0and1.0(inclusive).
- A
TooManyImages- Can only contain as many as
1,000<image: image>.
- Can only contain as many as
- Index Sitemap
TooManySitemaps- Can only contain as many as
50,000<sitemap>.
- Can only contain as many as
- Video Sitemap
DescriptionTooLong- A
<description>must be no longer than2048characters.
- A
DurationTooShortandDurationTooLong- A
<duration>must be between1and28,800seconds (inclusive).
- A
RatingTooLowandRatingTooHigh- A
<rating>must be between0.0and5.0(inclusive).
- A
UploaderNameTooLong- An
<uploader>'s<name>must be no longer than255characters.
- An
TooManyTags- Must contain no more than
32<tag>.
- Must contain no more than
Restrictions
This library cannot parse sitemaps of any kind (yet! - pull requests welcome! See Feature Requests section below).
Examples
URL Sitemap
cargo run --example generate_url_sitemap
https://www.toddgriffin.me/
1998-01-15T04:20:00+00:00
monthly
0.69
Index Sitemap
cargo run --example generate_index_sitemap
https://www.toddgriffin.me/sitemap1.xml.gz
1998-01-15T04:20:00+00:00
https://www.toddgriffin.me/sitemap2.xml.gz
2000-01-31T04:20:00+00:00
Image Sitemap
cargo run --example generate_image_sitemap
https://www.toddgriffin.me/sample1.html
https://www.toddgriffin.me/image.jpg
https://www.toddgriffin.me/photo.jpg
https://www.toddgriffin.me/sample2.html
https://www.toddgriffin.me/picture.jpg
Video Sitemap
cargo run --example generate_video_sitemap
https://www.toddgriffin.me/videos/some_video_landing_page.html
https://www.toddgriffin.me/thumbs/123.jpg
Grilling steaks for summer
Alkis shows you how to get perfectly done steaks every time
https://www.toddgriffin.me/video123.mp4
https://www.toddgriffin.me/videoplayer.php?video=123
600
2021-11-05T19:20:30+08:00
4.2
8633
1998-01-15T12:20:00+08:00
yes
CA GB IE US
tv web
yes
GrillyMcGrillserson
no
steak
meat
summer
outdoor
News Sitemap
cargo run --example generate_news_sitemap
https://www.toddgriffin.me/business/article55.html
The Example Times
en
1998-01-15T04:20:00+00:00
Companies A, B in Merger Talks
Alternative libraries
The rust-sitemap and sitewriter libraries are by far the best
alternatives.
This pro/con list is accurate as of the most recent update to this document.
rust-sitemap
Pros:
- Supports URL, Index sitemaps
- Supports reading files
- Supports writing files
Cons:
- Doesn't support Image, Video, News sitemaps
- Only supports some validations
- Low struct/method documentation
sitewriter
Pros:
- Supports URL sitemaps
- Supports writing files
- Support builder pattern
- uses quick-xml, so it should be quite fast
- Some struct/method documentation
Cons:
- Doesn't support Index, Image, Video, News sitemaps
- Doesn't support reading files
- Zero data validations
sitemap-iter
Pros:
- Supports URL sitemaps
- Supports reading files
Cons:
- Doesn't support Index, Image, Video, News sitemaps
- Doesn't support writing files
- Zero data validations
- Low struct/method documentation
rust-sitemap-writer
Pros:
- Supports URL sitemaps
- Supports writing files
Cons:
- Doesn't support Index, Image, Video, News sitemaps
- Doesn't support reading files
- Zero data validations
- Zero struct/method documentation
mdbook-sitemap-generator
Pros:
- Semi-supports URL sitemaps
- Supports writing files
Cons:
- Not a general use sitemap library
- Doesn't support every possible tag of URL sitemaps
- Doesn't support Index, Image, Video, News sitemaps
- Doesn't support reading files
- Zero data validations
- Zero struct/method documentation
Developers
Project is under active maintenance - even if there are no recent commits! Please submit an issue / bug request if the library needs updating for any reason!
Philosophy
This library should be fast, efficient, strictly adhere to the Sitemap specification, and strictly adhere to Google Search Console's best practices.
A feature request will be accepted if it exists in the specification and if it is a best practice to use it.
For example, here are some deprecated Image Sitemap fields: <image:caption>,
<image:geo_location>, <image:title>, <image:license>. While the
specification technically describes these fields, Google Search Console's best
practices is to omit them.
Over the years, we introduced a number of tags and tag attributes for Google sitemap extensions, specifically the Image and Video extensions.
Most of these tags were added to allow site owners to deliver data more easily to Search. Upon evaluating the value of the Google sitemap extension tags, we decided to officially deprecate some tags and attributes, and remove them from our documentation. The deprecated tags will have no effect on indexing and search features after August 6, 2022.
If you are a sitemap plugin developer or manage your own sitemaps, there's no immediate action required; you can leave these tags and attributes in place without drawbacks. In the future, Search Console may show warnings once these updates are included in the next schema versions of the Image and Video extensions.
Source: https://developers.google.com/search/docs/crawling-indexing/sitemaps/image-sitemaps
Any contribution which doesn't follow this philosophy will unfortunately be closed.
On the flip side, if this library has not implemented any feature of the Sitemap spec - it must be implemented!
Specification
- https://www.sitemaps.org/protocol.html
- https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap#xml
- https://developers.google.com/search/docs/specialty/international/localized-versions#sitemap
Feature Requests
Reading sitemap files (+ possible speed boost)
I would love to have this library use quick-xml instead of xml-builder.
The quick-xml library is built for speed and supports not only writing files,
but reading them too. I haven't benchmarked xml-builder or its use in this
library, so I cannot state the impact quick-xml will have there.
I originally went with xml-builder due to how extremely easy it is to learn
and use. It is by far fast enough for my use-cases, so I didn't have to reach
for anything else.
If you like what this library provides, but simply need the ability to parse
sitemaps and could also use a speed boost - please consider pushing a pull
request! (Preferably one that replaces xml-builder with quick-xml lol)
Codified country codes
In video sitemaps, there is a tag called <video: restriction> where the text
is a space-delimited list of country codes in
ISO 3166 format.
Currently, the country codes are typed-hinted as merely a HashSet<String>. It
would be awesome if there was an enum/struct that codified each ISO 3166 country
code as a separate entity, so this library could have extra assurances that each
code was valid.
The isocountry-rs and rust_iso_3166 libraries looks promising.
This hasn't been prioritized yet as I am currently satisfied with
HashSet<String> for my use cases. Pull requests are welcome!
Commands
make lintmake testmake fix
Credits
Made by Todd Everett Griffin.