scrapyard
Automatic web scraper and RSS generator library
Quickstart
Get started by creating an event loop.
async
Configuration
By default, config files can be found in ~/.config/scrapyard (Linux),
/Users/[Username]/Library/Application/Support/scrapyard (Mac) or
C:\Users\[Username]\AppData\Roaming\scrapyard (Windows).
To change the config directory location, specify the path:
let config_path = from;
init.await;
Here are all the options in the main configuration file scrapyard.json.
Adding feeds
To add feeds, edit feeds.json.
You can also include additional fields in PseudoChannel to overwrite default empty values.
Getting feeds
Referencing functions under FeedOption, there are 2 types of fetch functions.
Force fetching always request for a new copy of the feed, ignoring the fetch interval. Lazy fetching only fetched a new copy when the existing copy is out of date. This is particularly relevant when used without the auto-fetch loop.
Extractor scripts
The extractor scripts must accept 1 command line argument and prints out 1 JSON
response to stdout, normal console.log() in JS will do. You get the idea.
The first argument would specify a file path, within that file contains the arguments for the scraper.
Command line input:
Expected output:
License: AGPL-3.0