rjoin 0.2.0 - Docs.rs

## generate data
```bash
$ cat /dev/urandom | tr -dc 'a-z' | fold -w 5 | head -n 1000000 | sort | awk -vOFS=',' '{print $1, "L"}' > left-li
$ cat /dev/urandom | tr -dc 'a-z' | fold -w 5 | head -n 1000000 | sort | awk -vOFS=',' '{print $1, "R"}' > right-li
$ cat /dev/urandom | tr -dc 'a-z' | fold -w 4 | head -n 1000000 | sort | awk -vOFS=',' '{print $1, "L"}' > left-hi
$ cat /dev/urandom | tr -dc 'a-z' | fold -w 4 | head -n 1000000 | sort | awk -vOFS=',' '{print $1, "R"}' > right-hi
```

Notes: we generate two sets of data. The first one having the intersection between the left and right file of less than 10% (left-li and right-li). The second one with the intersetion of about 90% (left-hi and right-hi).


## inner join

The inner join prints only the matched records.

##### low intersection performance

| Tool | Command | Line Count | Time |
| ---- | ------- | ---------- | ---- |
| rjoin | `rj left-li right-li` | 84107 | **0.18s** |
| GNU join | `LC_COLLATE=C join -t ','  left-li right-li` | 84107 | 0.19s |

##### high intersection performance

| Tool | Command | Line Count | Time |
| ---- | ------- | ---------- | ---- |
| rjoin | `rj left-hi right-hi` | 2187458 | 0.36s |
| GNU join | `LC_COLLATE=C join -t ','  left-hi right-hi` | 2187458 | 0.36s |

## left outer join

The left outer join prints the matched records as well as unmatched records from the left file.

##### low intersection performance

| Tool | Command | Line Count | Time |
| ---- | ------- | ---------- | ---- |
| rjoin | `rj -lb left-li right-li` | 1003428 | 0.25s |
| GNU join | `LC_COLLATE=C join -t ',' -a 1  left-li right-li` | 1003428 | 0.25s |

##### high intersection performance

| Tool | Command | Line Count | Time |
| ---- | ------- | ---------- | ---- |
| rjoin | `rj -lb left-hi right-hi` | 2299563 | 0.41s |
| GNU join | `LC_COLLATE=C join -t ',' -a 1  left-hi right-hi` | 2299563 | **0.39s** |

## full outer join

The full outer join prints the matched records as well as unmatched records from both files.

##### low intersection performance

| Tool | Command | Line Count | Time |
| ---- | ------- | ---------- | ---- |
| rjoin | `rj -lbr left-li right-li` | 1922744 | 0.34s |
| GNU join | `LC_COLLATE=C join -t ',' -a 1 -a 2  left-li right-li` | 1922744 | **0.30s** |

##### high intersection performance

| Tool | Command | Line Count | Time |
| ---- | ------- | ---------- | ---- |
| rjoin | `rj -lbr left-hi right-hi` | 2411874 | 0.42s |
| GNU join | `LC_COLLATE=C join -t ',' -a 1 -a 2  left-hi right-hi` | 2411874 | **0.40s** |

## observations

`rj` is slightly faster when the output is small but loses with medium-large outputs. There are two factors in play here:
-  `rj` uses a faster parser, based on AVX2 instructions rather than
a simple `memchr` based one found in GNU join.
-  `rj` pays a penalty for multiple fields join by introducing
another indirection for them. This seems to have a non-trivial cost when
printing the output. However it can be mitigated by specializing for the
single field joins.


## final notes

GNU join uses `LC_COLLATE=C` environmental variable to force the byte ordering when comparing the keys. The same ordering is used in `rjoin`.
Both tools check whether the input is ordered correctly.