Expand description
§Docbox Web Scraper
Web-scraping client for getting website metadata, favicon, …etc and maintaining an internal cache
§Environment Variables
DOCBOX_WEB_SCRAPE_HTTP_PROXY- Proxy server address to use for HTTP requestsDOCBOX_WEB_SCRAPE_HTTPS_PROXY- Proxy server address to use for HTTPS requestsDOCBOX_WEB_SCRAPE_METADATA_CACHE_DURATION- Time before cached metadata is considered expiredDOCBOX_WEB_SCRAPE_METADATA_CACHE_CAPACITY- Maximum amount of metadata to cache at onceDOCBOX_WEB_SCRAPE_METADATA_CONNECT_TIMEOUT- Timeout when connecting while scrapingDOCBOX_WEB_SCRAPE_METADATA_READ_TIMEOUT- Timeout when reading responses from scrapingDOCBOX_WEB_SCRAPE_IMAGE_CACHE_DURATION- Time before cached images are considered expiredDOCBOX_WEB_SCRAPE_IMAGE_CACHE_CAPACITY- Maximum images to cache at once
Structs§
- Resolved
Image - Represents an image that has been resolved where the contents are now know and the content type as well
- Resolved
Website Metadata - Metadata resolved from a scraped website
- Url
- A parsed URL record.
- Website
Meta Service - Service for looking up website metadata and storing a cached value
- Website
Meta Service Config - Configuration for the website metadata service
Enums§
- Website
Meta Service Config Error - Errors that could occur when loading the configuration