⛓️gzp
Multithreaded encoding.
Why?
This crate provides a near drop in replacement for Write that has will compress chunks of data in parallel and write
to an underlying writer in the same order that the bytes were handed to the writer. This allows for much faster
compression of data.
Supported Encodings:
- Gzip via flate2
- Zlib via flate2
- Raw Deflate via flate2
- Snappy via rust-snappy
Usage / Features
By default pgz has the deflate_default feature enabled which brings in the best performing zlib inplementation as
the backend for flate2.
Examples
- Deflate default
[]
= { = "*" }
- Rust backend, this means that the
Zlibformat will not be available.
[]
= { = "*", = false, = ["deflate_rust"] }
- Snap only
[]
= { = "*", = false, = ["snap_default"] }
Examples
Simple example
use ;
use ;
let mut writer = vec!;
// ZBuilder will return a trait object that transparent over `ParZ` or `SyncZ`
let mut parz = new
.num_threads
.from_writer;
parz.write_all.unwrap;
parz.write_all.unwrap;
parz.finish.unwrap;
An updated version of pgz.
use ;
use ;
Same thing but using Snappy instead.
use ;
use ;
Acknowledgements
- Many of the ideas for this crate were directly inspired by
pigz, including implementation details for some functions.
Contributing
PRs are very welcome! Please run tests locally and ensure they are passing. May tests are ignored in CI because the CI instances don't have enough threads to test them / are too slow.
&&
Note that tests will take 30-60s.
Future todos
- Pull in an adler crate to replace zlib impl (need one that can combine values, probably implement COMB from pigz).
- Add more metadata to the headers
- Add a BGZF mode + tabix index generation (or create that as its own crate)
- Try with https://docs.rs/lzzzz/0.8.0/lzzzz/lz4_hc/fn.compress.html
Benchmarks
All benchmarks were run on the file in ./bench-data/shakespeare.txt catted together 100 times which creates a rough
550Mb file.
The primary benchmark takeaway is that with 2 threads pgz is about as fast as single threaded. With 4 threads is 2-3x
faster than single threaded and improves from there. It is recommended to use at least 4 threads.