reddit-search-0.8.0 is not a library.
reddit-search
a tool for searching the pushshift reddit dumps written in rust. Available from crates.io via cargo install reddit-search
The dumps are available via torrent from here: https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee
usage
To see command line parameters, use reddit-search -h or --help
Sample usage commands
Basic Usage
Iterating over the whole dataset
Unix-like Shell (Bash, ZSH, etc)
for; do; ; done
Powershell
Get-ChildItem C:\path\to\dumps | ForEach-Object { reddit-search.exe --input $_.FullName -f KEY:DATA -o OUTPUT_FILENAME.json --append }
Descriptions of the fields contained within reddit dumps
Note that not all data contains all of these fields (for example, a comment from 2007 would not have the "gilded" field since that system was not implemented until later.)
Boolean values are saved numerically (0 is false, 1 is true)
| Field | Description |
|---|---|
| archived | Boolean indicating if the item is archived |
| id | Unique identifier of the item |
| controversiality | Boolean indicating if the item is controversial |
| body | Text content of the item |
| ups | Number of upvotes |
| score_hidden | Boolean indicating if the score is hidden |
| edited | Boolean indicating if the item has been edited |
| distinguished | Status of the item (e.g., null, moderator) |
| created_utc | UTC timestamp of item creation |
| name | Another unique identifier (?) |
| gilded | Number indicating how many times the item was gilded |
| score | Total score of the item |
| subreddit_id | Identifier of the subreddit |
| link_id | Identifier of the link to the comment |
| author_flair_text | Text of the author's flair |
| subreddit | Name of the subreddit |
| retrieved_on | UTC timestamp of when the item was retrieved |
| parent_id | Identifier of the parent item |
| downs | Number of downvotes |
| author_flair_css_class | CSS class of the author's flair |
| author | Name of the author |