Crate cvmfs_server_scraper

source
Expand description

A library for scraping CVMFS servers and extracting their metadata.

CVMFS servers provide a number of public metadata files that can be scraped to extract information about the server and its repositories. However, some of these files are not required to be present, depending on the server backend or its administrators, and even in files present, a number of keys are optional. This library provides a way to scrape these files and extract the metadata in a structured way.

The following files are currently supported:

  • cvmfs/info/v1/repositories.json : The list of repositories and replicas hosted on the server (not present on servers with S3 backends)
  • cvmfs/info/v1/meta.json : Contact points and human-generated metadata about the server (optional)

And for each repository, it fetches:

  • cvmfs/<repo>/.cvmfs_status.json : Information about the last garbage collection and snapshot.
  • cvmfs/<repo>/.cvmfspublished : Manifest of the repository.

Due to the nature of repositories.json, one may force repositories to be scraped by providing an explicit list of repositories by name.

§Examples

use cvmfs_server_scraper::{Hostname, Server, ServerBackendType, ServerType,
    ScrapedServer, ScraperCommon, Scraper, CVMFSScraperError, DEFAULT_GEOAPI_SERVERS};

#[tokio::main]
async fn main() -> Result<(), CVMFSScraperError> {
    let servers = vec![
        Server::new(
            ServerType::Stratum1,
            ServerBackendType::CVMFS,
            Hostname::try_from("azure-us-east-s1.eessi.science").unwrap(),
        ),
        Server::new(
            ServerType::Stratum1,
            ServerBackendType::AutoDetect,
            Hostname::try_from("aws-eu-central-s1.eessi.science").unwrap(),
        ),
        Server::new(
            ServerType::SyncServer,
            ServerBackendType::S3,
            Hostname::try_from("aws-eu-west-s1-sync.eessi.science").unwrap(),
        ),
    ];

    let repolist = vec!["software.eessi.io", "dev.eessi.io", "riscv.eessi.io"];
    let ignored_repos = vec!["nope.eessi.io"];

   // Build a Scraper and scrape all servers in parallel
   let scraped_servers = Scraper::new()
      .forced_repositories(repolist)
      .ignored_repositories(ignored_repos)
      .geoapi_servers(DEFAULT_GEOAPI_SERVERS.clone())? // This is the default list
      .with_servers(servers) // Transitions to a WithServer state.
      .validate()? // Transitions to a ValidatedAndReady state, now immutable.
      .scrape().await; // Perform the scrape, return servers.

   for server in scraped_servers {
       match server {
           ScrapedServer::Populated(populated_server) => {
                println!("{}", populated_server);
                populated_server.output();
                println!();
           }
           ScrapedServer::Failed(failed_server) => {
               panic!("Error! {} failed scraping: {:?}", failed_server.hostname, failed_server.error);
           }
      }
    }
    Ok(())
}

Re-exports§

Structs§

Enums§

Traits§