A tool for joining CSV data on command line.
Dual-licensed under MIT or unlicense
Add this to your
[dependencies] rjoin = "0.1.0"
add this to your crate root:
extern crate rjoin;
The binary name for
$ cargo install rjoin
(don't forget to add
$HOME/.cargo/bin to your path).
Why should you use
- it can perform the join on multiple fields
- it has higher flexibilty on specifying the field separators and record terminators compared to GNU join
- it has a very flexible CSV parser which can recognize quotes, escape characters and even comments (currently based on BurntSushi's excellent CSV library)
- it is likely faster than GNU join when checking the correct order of records
- it has (subjectively) cleaner CLI.
Why shouldn't you use
- you need a specific output format. GNU join is more flexible on this, but it can be mitigated by piping the output to
- you need a case insensitive join. This can be mitigated by preprocessing data with
- you need to perform the join as fast as possible by not checking the correct order of the input.
Let's suppose we have the following data:
$ cat left color,blue color,green color,red shape,circle shape,square
$ cat right altitude,low altitude,high color,orange color,purple
To get the lines with the common key:
$ rj left right color,blue,orange color,blue,purple color,green,orange color,green,purple color,red,orange color,red,purple
- by default, the first field is the key. If you wish to use another field, you can specify it using
--key/-koption (even per file).
rjoinsupports multiple fields as the key, but the number of key fields in both files must be equal.
- by default, only the lines with the common key are printed. If you wish to print also unmached lines from the left or right file, use
any combination of these:
--show-both/-b. Note however, if you use any of these options, the default behavior of showing matches of both files is reset and must be made explicit if desired by adding
- there are multiple lines with the same key in both files, resulting in Cartesian product.
To get the lines with the unmatched key in both files:
$ rj -lr left right altitude,low altitude,high shape,circle shape,square
Any kind of contribution (e.g. comment, suggestion, question, bug report and pull request) is welcome.
A big thanks to BurntSushi for his excellent work.