# reddit-search
a tool for searching the pushshift reddit dumps written in rust. Available from crates.io via `cargo install reddit-search`
if you do not have cargo or rustc, please follow the steps outlined in the official rust documentation:
https://www.rust-lang.org/tools/install
The dumps are available via torrent from here: https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee
# usage
To see command line parameters, use reddit-search -h or --help
# Sample usage commands
## Basic Usage
```sh
reddit-search --input <input file path> --output <output file path> --fields <field:value> ...
```
## Presets
| `en_news` | Subreddits focused on global and regional news and current events. |
| `en_politics` | A range of subreddits covering various political discussions, humor, and memes, including general politics and specific political orientations. |
| `en_science` | Subreddits dedicated to general science, scientific inquiries, and discussions on scientific advancements. |
| `en_hate_speech` | Subreddits known for promoting hate speech and controversial content. |
| `controversial` | Content with high levels of controversy across various themes. |
Each preset is a collection of filters designed to target specific themes. Should you be interested in using this and would like additional filters to be added, do not hesitate to contact me.
# Descriptions of the fields contained within reddit dumps
Note that not all data contains all of these fields (for example, a comment from 2007 would not have the "gilded" field since that system was not implemented until later.)
Boolean values are saved numerically (0 is false, 1 is true)
| archived | Boolean indicating if the item is archived |
| id | Unique identifier of the item |
| controversiality | Boolean indicating if the item is controversial |
| body | Text content of the item |
| ups | Number of upvotes |
| score_hidden | Boolean indicating if the score is hidden |
| edited | Boolean indicating if the item has been edited |
| distinguished | Status of the item (e.g., null, moderator) |
| created_utc | UTC timestamp of item creation |
| name | Another unique identifier (?) |
| gilded | Number indicating how many times the item was gilded |
| score | Total score of the item |
| subreddit_id | Identifier of the subreddit |
| link_id | Identifier of the link to the comment |
| author_flair_text | Text of the author's flair |
| subreddit | Name of the subreddit |
| retrieved_on | UTC timestamp of when the item was retrieved |
| parent_id | Identifier of the parent item |
| downs | Number of downvotes |
| author_flair_css_class | CSS class of the author's flair |
| author | Name of the author |
# Versioning
Older versions of the program can be downloaded using version overrides with cargo. Tags are not carried over to github.