runiq 1.0.0

An efficient way to filter duplicate lines from input, à la uniq.
# runiq
[![Crates.io](https://img.shields.io/crates/v/runiq.svg)](https://crates.io/crates/runiq) [![Unix Build Status](https://img.shields.io/travis/whitfin/runiq.svg?label=unix)](https://travis-ci.org/whitfin/runiq) [![Windows Build Status](https://img.shields.io/appveyor/ci/whitfin/runiq.svg?label=win)](https://ci.appveyor.com/project/whitfin/runiq)

This project offers an efficient way (in both time and space) to filter duplicate entries (lines) from texual input. This project was born from [neek](https://github.com/whitfin/neek), but optimized for both speed and memory. Several filtering options are supported depending on your data and tradeoffs you wish to make between speed and memory usage. For a more detailed explanation, see the relevant [blog post](https://whitfin.io/filtering-unique-logs-using-rust/).

### Installation

This tool will be available via [Crates.io](https://crates.io/crates/runiq), so you can install it directly with `cargo`:

```shell
$ cargo install runiq
```

If you'd rather just grab a pre-built binary, you might be able to download the correct binary for your architecture directly from the latest release on GitHub [here](https://github.com/whitfin/runiq/releases). The list of binaries may not be complete, so please file an issue if your setup is missing (bonus points if you attach the appropriate binary).

### Comparisons

Here are some comparisons of `runiq` against other methods of filtering uniques:

| Tool  | Flags     | Time Taken     | Peak Memory     |
|-------|-----------|----------------|-----------------|
| neek  | N/A       | 55.8s          | 313MB           |
| sort  | -u        | 595s           | 9.07GB          |
| uq    | N/A       | 32.3s          | 1.66GB          |
| runiq | -f digest | **25.3s**      | 64.6MB          |
| runiq | -f naive  | 32.2s          | 1.62GB          |
| runiq | -f bloom  | 44.5s          | **13MB**        |

The numbers above are based on filtering unique values out of the following file:

```
File size:     3,290,971,321 (~3.29GB)
Line count:        5,784,383
Unique count:      2,715,727
Duplicates:        3,068,656
```