csv-diff
==========================
### Find the difference between two CSVs - with ludicrous speed!🚀
---------
<a href="https://gitlab.com/janriemer/csv-diff">
<img src="https://gitlab.com/janriemer/csv-diff/badges/main/pipeline.svg" />
</a>
<a href="https://crates.io/crates/csv-diff">
<img src="https://img.shields.io/crates/v/csv-diff.svg?style=flat-square"
alt="Crates.io version" />
</a>
<a href="https://crates.io/crates/csv-diff">
<img src="https://img.shields.io/crates/d/csv-diff.svg?style=flat-square"
alt="Download" />
</a>
<a href="https://docs.rs/csv-diff">
<img src="https://img.shields.io/badge/docs-latest-blue.svg?style=flat-square"
alt="docs.rs docs" />
</a>
----------
## Documentation
https://docs.rs/csv-diff
### ⚠️Warning⚠️
This crate is still in it's infancy. There will be breaking changes (and dragons🐉) in the beginning.
## Highlights ✨
- fastest CSV-diffing library in the world🚀
- compare two CSVs with 1,000,000 rows x 9 columns in __under 500ms__
- thread-pool agnostic 🧵🧶
- use your existing thread-pool (e.g. [rayon][rayon]) or use threads directly (via [crossbeam][crossbeam-scope]); configurable via [Cargo features](#getting-started)
- _caveat: thread-spawning must support [scoped threads][crossbeam-scope]_
[rayon]: https://docs.rs/rayon/1.5.0/rayon/
[crossbeam-scope]: https://docs.rs/crossbeam/0.8.0/crossbeam/thread/fn.scope.html
## Example
```rust
use std::io::Cursor;
use csv_diff::{csv_diff::CsvByteDiff, csv::Csv};
use csv_diff::diff_row::{ByteRecordLineInfo, DiffByteRecord};
use std::collections::HashSet;
use std::iter::FromIterator;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// some csv data with a header, where the first column is a unique id
let csv_data_left = "id,name,kind\n\
1,lemon,fruit\n\
2,strawberry,fruit";
let csv_data_right = "id,name,kind\n\
1,lemon,fruit\n\
2,strawberry,nut";
let csv_byte_diff = CsvByteDiff::new()?;
let mut diff_byte_records = csv_byte_diff.diff(
// we need to wrap our bytes in a cursor, because it needs to be `Seek`able
Csv::new(Cursor::new(csv_data_left.as_bytes())),
Csv::new(Cursor::new(csv_data_right.as_bytes())),
)?;
diff_byte_records.sort_by_line();
let diff_byte_rows = diff_byte_records.as_slice();
assert_eq!(
diff_byte_rows,
&[DiffByteRecord::Modify {
delete: ByteRecordLineInfo::new(
csv::ByteRecord::from(vec!["2", "strawberry", "fruit"]),
3
),
add: ByteRecordLineInfo::new(csv::ByteRecord::from(vec!["2", "strawberry", "nut"]), 3),
field_indices: vec![2]
}]
);
Ok(())
}
```
## Getting Started
In your Cargo.toml file add the following lines under `[dependencies]`:
```toml
csv-diff = "0.1.0-alpha"
```
This will use a rayon thread-pool, but you can opt-out of it and for example use threads without a thread-pool, by opting in into the `crossbeam-threads` feature (and opting-out of the default features):
```toml
csv-diff = { version = "0.1.0-alpha", default-features = false, features = ["crossbeam-threads"] }
```
## Use Case
This crate should be used on CSV data that has some sort of *primary key* for uniquely identifying a record.
It is __not__ a general line-by-line diffing crate.
You can imagine dumping a database table in CSV format from your *test* and *production* system and comparing it with each other to find differences.
## Caveats
Due to the fact that this crate is still in it's infancy, there are still some caveats, which we _might_ resolve in the near future:
- resulting CSV records/lines that have differences are provided as [raw bytes][ByteRecord]; you can use [`StringRecord::from_byte_record`](https://docs.rs/csv/1.1.6/csv/struct.StringRecord.html#method.from_byte_record) , provided by the [csv crate][csv], to try converting them into UTF-8 encoded records.
- CSVs must be [`Seek`][Seek]able
- `Seek` is implemented for the most important types like:
- [`File`][File]s
- and when wrapped in a [`Cursor`][Cursor]
- `String`s and `&str`
- `[u8]`
- when using your own custom thread-pool, thread-spawning must support [scoped threads][crossbeam-scope]
- documentation must be improved
[csv]: https://docs.rs/csv/1.1.6/csv/
[Seek]: https://doc.rust-lang.org/std/io/trait.Seek.html
[ByteRecord]: https://docs.rs/csv/1.1.6/csv/struct.ByteRecord.html
[Cursor]: https://doc.rust-lang.org/std/io/struct.Cursor.html
[File]: https://doc.rust-lang.org/std/fs/struct.File.html
## Benchmarks
You can run benchmarks with the following command:
```shell
cargo bench
```
## Safety
This crate is implemented in __100% Safe Rust__, which is ensured by using `#![forbid(unsafe_code)]`.
## MSRV
The Minimum Supported Rust Version for this crate is __1.49__. An increase of MSRV will be indicated by a breaking change (according to SemVer).
## Credits
This crate is inspired by the CLI tool [csvdiff](https://github.com/aswinkarthik/csvdiff) by Aswin Karthik, which is written in Go. Definitely check it out. It is a great tool.
Additionally, this crate would not exist without the awesome Rust community and these fantastic crates 🦀:
- [ahash](https://docs.rs/ahash/0.7.6/ahash/)
- [crossbeam](https://docs.rs/crossbeam/0.8.1/crossbeam/)
- [crossbeam-channel](https://docs.rs/crossbeam-channel/0.5.1/crossbeam_channel/)
- [csv][csv]
- [mown](https://docs.rs/mown/0.2.1/mown/)
- [rayon][rayon]
- [smallvec](https://docs.rs/smallvec/1.7.0/smallvec/)
- [thiserror](https://docs.rs/thiserror/1.0.30/thiserror/)
<br>
-------
<br>
#### License
<sup>
Licensed under either of <a href="LICENSE-APACHE">Apache License, Version
2.0</a> or <a href="LICENSE-MIT">MIT license</a> at your option.
</sup>
<br>
<sub>
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in this crate by you, as defined in the Apache-2.0 license, shall
be dual licensed as above, without any additional terms or conditions.
</sub>