⛓️gzp
Multithreaded encoding.
Why?
This crate provides a near drop in replacement for Write that has will compress chunks of data in parallel and write
to an underlying writer in the same order that the bytes were handed to the writer. This allows for much faster
compression of data.
Supported Encodings:
- Gzip via flate2
- Snappy via rust-snappy
Usage / Features
There are no features enabled by default. pargz_default enables pargz and flate2_default. flate2_default in turn
uses the default backed for flate2, which, as of this writing is the pure rust backend. parsnap_default is just a
wrapper over parsnap, which pulls in the snap dependency.
The following demonstrate common ways to override the features:
Simple way to enable both pargz and parsnap:
[]
= { = "*", = ["pargz_default", "parsnap_default"] }
Use pargz to pull in flate2 and zlib-ng-compat to use the zlib-ng-compat backend for flate2 (most performant).
[]
= { = "*", = ["pargz", "zlib-ng-compat"] }
To use Snap (could also use parsnap_default):
[]
= { = "*", = true, = ["parsnap"] }
To use both Snap and Gzip with specific backend:
[]
= { = "*", = true, = ["parsnap_default", "pargz", "zlib-ng-compat"] }
Examples
Simple example
use ;
use ParGz;
An updated version of pgz.
use ParGz;
use ;
Same thing but using Snappy instead.
use ParSnap;
use ;
Notes
- Files written with this are just Gzipped blocks catted together and must be read
with
flate2::bufread::MultiGzDecoder.
Future todos
- Explore removing
Bytesin favor of raw vec - Check that block is actually smaller than when it started
- Update the CRC value with each block written
- Add a BGZF mode + tabix index generation (or create that as its own crate)
- Try with https://docs.rs/lzzzz/0.8.0/lzzzz/lz4_hc/fn.compress.html
Benchmarks
All benchmarks were run on the file in ./bench-data/shakespeare.txt catted together 100 times which creats a rough
550Mb file.
Note that there are far more comprehensive comparisons of the tradeoffs in differet compression algorithms / compression levels elsewhere on the interent. This is meant to give rough understanding of the tradoffs involved.
| Name | Num Threads | Compression Level | Buffer Size | Time | File Size |
|---|---|---|---|---|---|
| Gzip Only | NA | 3 | 128 Kb | 6.6s | 218 Mb |
| Gzip | 1 | 3 | 128 Kb | 2.4s | 223 Mb |
| Gzip | 4 | 3 | 128 Kb | 1.2s | 223 Mb |
| Gzip | 8 | 3 | 128 Kb | 0.8s | 223 Mb |
| Gzip | 16 | 3 | 128 Kb | 0.6s | 223 Mb |
| Gzip | 30 | 3 | 128 Kb | 0.6s | 223 Mb |
| Snap Only | NA | NA | 128 Kb | 1.6s | 333 Mb |
| Snap | 1 | NA | 128 Kb | 0.7s | 333 Mb |
| Snap | 4 | NA | 128 Kb | 0.5s | 333 Mb |
| Snap | 8 | NA | 128 Kb | 0.4s | 333 Mb |
| Snap | 16 | NA | 128 Kb | 0.4s | 333 Mb |
| Snap | 30 | NA | 128 Kb | 0.4s | 333 Mb |