Crate docbox_web_scraper

Crate docbox_web_scraper 

Source
Expand description

§Docbox Web Scraper

Web-scraping client for getting website metadata, favicon, …etc and maintaining an internal cache

§Environment Variables

  • DOCBOX_WEB_SCRAPE_HTTP_PROXY - Proxy server address to use for HTTP requests
  • DOCBOX_WEB_SCRAPE_HTTPS_PROXY - Proxy server address to use for HTTPS requests
  • DOCBOX_WEB_SCRAPE_METADATA_CACHE_DURATION - Time before cached metadata is considered expired
  • DOCBOX_WEB_SCRAPE_METADATA_CACHE_CAPACITY - Maximum amount of metadata to cache at once
  • DOCBOX_WEB_SCRAPE_METADATA_CONNECT_TIMEOUT - Timeout when connecting while scraping
  • DOCBOX_WEB_SCRAPE_METADATA_READ_TIMEOUT - Timeout when reading responses from scraping
  • DOCBOX_WEB_SCRAPE_IMAGE_CACHE_DURATION - Time before cached images are considered expired
  • DOCBOX_WEB_SCRAPE_IMAGE_CACHE_CAPACITY - Maximum images to cache at once

Structs§

ResolvedImage
Represents an image that has been resolved where the contents are now know and the content type as well
ResolvedWebsiteMetadata
Metadata resolved from a scraped website
Url
A parsed URL record.
WebsiteMetaService
Service for looking up website metadata and storing a cached value
WebsiteMetaServiceConfig
Configuration for the website metadata service

Enums§

WebsiteMetaServiceConfigError
Errors that could occur when loading the configuration