Expand description
§httpdirectory’s readme
This project is in an early stage of development.
§Description
This library provides a convenient way to scrape directory indexes
(like the ones created by mod_autoindex
with apache or autoindex
with nginx) and get a structure that abstracts it. For instance one
may have the following website:
The library will insert in an HttpDirectory
structure all the
information that is to say, name, link, size and date of files
or directories. Printing it will produce the following output:
https://cloud.debian.org/images/cloud/
DIR - ..
DIR - 2024-07-01 23:19 OpenStack/
DIR - 2025-04-28 21:33 bookworm-backports/
DIR - 2025-04-28 20:53 bookworm/
DIR - 2025-05-12 23:57 bullseye-backports/
DIR - 2025-05-12 23:22 bullseye/
DIR - 2024-07-03 21:46 buster-backports/
DIR - 2024-07-03 21:46 buster/
DIR - 2024-04-01 14:20 sid/
DIR - 2019-07-18 10:40 stretch-backports/
DIR - 2019-07-18 10:40 stretch/
DIR - 2023-07-25 07:43 trixie/
§Usage
First obtain a directory from an url using HttpDirectory::new(url)
method, then you can use dirs()
, files()
, parent_directory()
or
filter_by_name()
, cd()
, sort_by_name()
, sort_by_date()
,
sort_by_size()
to get respectively all directories, all files, the
ParentDirectory
, filtering by the name (with a Regex), changing
directory, sorting by name, by date or by size of this HttpDirectory
listing entries:
use httpdirectory::httpdirectory::HttpDirectory;
async fn first_example() {
if let Ok(httpdir) = HttpDirectory::new("https://cloud.debian.org/images/cloud/").await {
println!("{:?}", httpdir.dirs());
}
}
In addition you can get some Stats
about an HttpDirectory
listing
using stats
method. It will return a Stats
structure containing
the number of directories, number of files, total apparent size, the
number of files or directories with a valid date, the number of files
or directories that has no valid dates, the number of parents (that
should always be equal or less than 1)
§Examples
You can see some examples in the example directory:
- onedir example for a small example with a
call to the
cd()
method - mirrors example that will try to crawl a list of 422 debian mirrors and print in red those that were possibly not correctly interpreted
- debug me that is used in debugging sessions to try to improve the program by being able to interpret more websites
Modules§
- entry
- Module that helps storing all information about the entry (name, date, size and link)
- error
- All errors that you might get from httpdirectory library
- httpdirectory
- Module that allows one to get http directories in a structure with convenient methods
- httpdirectoryentry
- Module to deal with
HttpDirectoryEntry
enum that tells whether theEntry
is a Parent directory, a directory or a file. - stats
- Module that will give access to a structure that will contain some statistics
about an
HttpDirectory
by calling itsstats()
method
Constants§
- HTTPDIR_
USER_ AGENT - User Agent used by httpdirectory that should be formatted “httpdirectory/{}” where {} is the version of the library