urls2disk is a rust crate that helps you to download a series of
webpages in parallel and save them to disk. Depending on your
choice, it will either write the raw bytes of the webpages to disk or it will
first convert them to PDF before writing them to disk. It's helpful for general
webscraping as well as for converting a bunch of webpages to PDF.
A key feature of
urls2disk is that you can set a maximum
number of requests per second while downloading webpages; so you can effectively throttle
yourself so as not to run afoul of any servers that will block you if you
hit them with too many requests at once.
Under the hood,
urls2disk uses wkhtmltopdf to
convert webpages to PDF if you choose that option; so to use it you'll need
wkhtmltopdf installed on your machine. Installing wkhtmltopdf on macOS with
Homebrew is super simple. Just
brew install Caskroom/cask/wkhtmltopdf
in your terminal. For other systems or if you don't have Homebrew, you're on your own
for installing wkhtmltopdf, but perhaps at some point I'll lookup instructions for how to
install it on different setups and include them here. As far as versions go, I've only tested
with wkhtmltopdf 0.12.4.
Here's an example of downloading Apple, Inc.'s annual reports from 2010-2017
from the SEC website using