scrapyard
Automatic web scraper and RSS generator library
Quickstart
Get started by creating an event loop.
async
Configuration
By default, config files can be found in ~/.config/scrapyard
(Linux),
/Users/[Username]/Library/Application/Support/scrapyard
(Mac) or
C:\Users\[Username]\AppData\Roaming\scrapyard
(Windows).
To change the config directory location, specify the path:
let config_path = from;
init.await;
Here are all the options in the main configuration file scrapyard.json
.
Adding feeds
To add feeds, edit feeds.json
.
You can also include additional fields in PseudoChannel to overwrite default empty values.
Getting feeds
Referencing functions under FeedOption, there are 2 types of fetch functions.
Force fetching always request for a new copy of the feed, ignoring the fetch interval. Lazy fetching only fetched a new copy when the existing copy is out of date. This is particularly relevant when used without the auto-fetch loop.
Extractor scripts
The extractor scripts must accept 1 command line argument and prints out 1 JSON
response to stdout, normal console.log()
in JS will do. You get the idea.
The first argument would specify a file path, within that file contains the arguments for the scraper.
Command line input:
Expected output:
License: AGPL-3.0