Skip to main content

Crate docbox_web_scraper

Crate docbox_web_scraper 

Source
Expand description

§Docbox Web Scraper

Web-scraping client for getting website metadata, favicon, …etc and maintaining an internal cache

§Environment Variables

  • DOCBOX_WEB_SCRAPE_HTTP_PROXY - Proxy server address to use for HTTP requests
  • DOCBOX_WEB_SCRAPE_HTTPS_PROXY - Proxy server address to use for HTTPS requests
  • DOCBOX_WEB_SCRAPE_METADATA_CACHE_DURATION - Time before cached metadata is considered expired
  • DOCBOX_WEB_SCRAPE_METADATA_CACHE_CAPACITY - Maximum amount of metadata to cache at once
  • DOCBOX_WEB_SCRAPE_METADATA_CONNECT_TIMEOUT - Timeout when connecting while scraping
  • DOCBOX_WEB_SCRAPE_METADATA_READ_TIMEOUT - Timeout when reading responses from scraping

Structs§

Favicon
Favicon extracted from a website
ResolvedImage
Represents an image that has been resolved where the contents are now know and the content type as well
ResolvedWebsiteMetadata
Metadata resolved from a scraped website
Url
A parsed URL record.
WebsiteMetaService
Service for looking up website metadata and storing a cached value
WebsiteMetaServiceConfig
Configuration for the website metadata service

Enums§

WebsiteMetaServiceConfigError
Errors that could occur when loading the configuration