reddit-search 0.8.0

A search tool for the pushshift.io Reddit dumps.
# reddit-search

a tool for searching the pushshift reddit dumps written in rust. Available from crates.io via `cargo install reddit-search`

The dumps are available via torrent from here: https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee

# usage

To see command line parameters, use reddit-search -h or --help

# Sample usage commands


## Basic Usage

```sh
reddit-search --input <input file path> --output <output file path> --fields <field:value> ...
```

## Iterating over the whole dataset


### Unix-like Shell (Bash, ZSH, etc)

```sh
for file in /path/to/dumps; do; reddit-search --append -i $file -o output.json -f field:value; done
```

### Powershell

```powershell
Get-ChildItem C:\path\to\dumps | ForEach-Object { reddit-search.exe --input $_.FullName -f KEY:DATA -o OUTPUT_FILENAME.json --append }
```

# Descriptions of the fields contained within reddit dumps


Note that not all data contains all of these fields (for example, a comment from 2007 would not have the "gilded" field since that system was not implemented until later.)

Boolean values are saved numerically (0 is false, 1 is true)

| Field                   | Description |
|-------------------------|-------------|
| archived                | Boolean indicating if the item is archived |
| id                      | Unique identifier of the item |
| controversiality        | Boolean indicating if the item is controversial |
| body                    | Text content of the item |
| ups                     | Number of upvotes |
| score_hidden            | Boolean indicating if the score is hidden |
| edited                  | Boolean indicating if the item has been edited |
| distinguished           | Status of the item (e.g., null, moderator) |
| created_utc             | UTC timestamp of item creation |
| name                    | Another unique identifier (?) |
| gilded                  | Number indicating how many times the item was gilded |
| score                   | Total score of the item |
| subreddit_id            | Identifier of the subreddit |
| link_id                 | Identifier of the link to the comment |
| author_flair_text       | Text of the author's flair |
| subreddit               | Name of the subreddit |
| retrieved_on            | UTC timestamp of when the item was retrieved |
| parent_id               | Identifier of the parent item |
| downs                   | Number of downvotes |
| author_flair_css_class  | CSS class of the author's flair |
| author                  | Name of the author |