runiq-lib 1.2.2

# runiq
[![Crates.io](https://img.shields.io/crates/v/runiq.svg)](https://crates.io/crates/runiq) [![Build Status](https://img.shields.io/github/workflow/status/whitfin/runiq/CI)](https://github.com/whitfin/runiq/actions)

This project offers an efficient way (in both time and space) to filter duplicate entries (lines) from texual input. This project was born from [neek](https://github.com/whitfin/neek), but optimized for both speed and memory. Several filtering options are supported depending on your data and tradeoffs you wish to make between speed and memory usage. For a more detailed explanation, see the relevant [blog post](https://whitfin.io/filtering-unique-logs-using-rust/).

### Installation

This tool will be available via [Crates.io](https://crates.io/crates/runiq), so you can install it directly with `cargo`:

```shell
$ cargo install runiq
```

If you'd rather just grab a pre-built binary, you might be able to download the correct binary for your architecture directly from the latest release on GitHub [here](https://github.com/whitfin/runiq/releases). The list of binaries may not be complete, so please file an issue if your setup is missing (bonus points if you attach the appropriate binary).

### Examples

```shell
$ cat << EOF >> input.txt
> this is a unique line
> this is a duplicate line
> this is another unique line
> this is a duplicate line
> this is a duplicate line
> EOF

$ cat input.txt
this is a unique line
this is a duplicate line
this is another unique line
this is a duplicate line
this is a duplicate line

$ runiq input.txt
this is a unique line
this is a duplicate line
this is another unique line
```

### Comparisons

Here are some comparisons of `runiq` against other methods of filtering uniques:

| Tool  | Flags     | Time Taken     | Peak Memory     |
|-------|-----------|----------------|-----------------|
| neek  | N/A       | 55.8s          | 313MB           |
| sort  | -u        | 595s           | 9.07GB          |
| uq    | N/A       | 32.3s          | 1.66GB          |
| runiq | -f digest | **17.8s**      | 64.6MB          |
| runiq | -f naive  | 26.3s          | 1.62GB          |
| runiq | -f bloom  | 36.8s          | **13MB**        |

The numbers above are based on filtering unique values out of the following file:

```
File size:     3,290,971,321 (~3.29GB)
Line count:        5,784,383
Unique count:      2,715,727
Duplicates:        3,068,656
```