# reddit-search
a tool for searching the pushshift reddit dumps written in rust. Available from crates.io via `cargo install reddit-search`
The dumps are available via torrent from here: https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee
# usage
To see command line parameters, use reddit-search -h or --help
# Sample usage commands
## Basic Usage
```sh
reddit-search --input <input file path> --output <output file path> --fields <field:value> ...
```
## Iterating over the whole dataset
### Unix-like Shell (Bash, ZSH, etc)
```sh
for file in /path/to/dumps; do; reddit-search --append -i $file -o output.json -f field:value; done
```
### Powershell
```powershell
Get-ChildItem C:\path\to\dumps | ForEach-Object { reddit-search.exe --input $_.FullName -f KEY:DATA -o OUTPUT_FILENAME.json --append }
```
# Descriptions of the fields contained within reddit dumps
Note that not all data contains all of these fields (for example, a comment from 2007 would not have the "gilded" field since that system was not implemented until later.)
Boolean values are saved numerically (0 is false, 1 is true)
| archived | Boolean indicating if the item is archived |
| id | Unique identifier of the item |
| controversiality | Boolean indicating if the item is controversial |
| body | Text content of the item |
| ups | Number of upvotes |
| score_hidden | Boolean indicating if the score is hidden |
| edited | Boolean indicating if the item has been edited |
| distinguished | Status of the item (e.g., null, moderator) |
| created_utc | UTC timestamp of item creation |
| name | Another unique identifier (?) |
| gilded | Number indicating how many times the item was gilded |
| score | Total score of the item |
| subreddit_id | Identifier of the subreddit |
| link_id | Identifier of the link to the comment |
| author_flair_text | Text of the author's flair |
| subreddit | Name of the subreddit |
| retrieved_on | UTC timestamp of when the item was retrieved |
| parent_id | Identifier of the parent item |
| downs | Number of downvotes |
| author_flair_css_class | CSS class of the author's flair |
| author | Name of the author |