parallel_bzip2_decoder
A high-performance, parallel bzip2 decoder for Rust.
This crate provides a Bz2Decoder that implements std::io::Read, allowing you to decompress bzip2 files in parallel using multiple CPU cores. It is designed to work efficiently with both single-stream (standard) and multi-stream (e.g., pbzip2) bzip2 files by scanning for block boundaries and decompressing them concurrently.
Features
- Parallel Decompression: Utilizes
rayonto decompress blocks in parallel. - Standard API: Implements
std::io::Readfor easy integration. - Memory Mapped: Efficiently handles large files using memory mapping.
- Flexible: Supports opening files directly or working with in-memory buffers (via
Arc). - Full bzip2 format support: Handles both single-stream and multi-stream bzip2 files
- Error handling: Comprehensive error reporting with
anyhowintegration - Memory efficient: Bounded channels and buffer reuse to minimize memory usage
Usage
Add this to your Cargo.toml:
[]
= "0.1"
Decompressing a File
The easiest way to use parallel_bzip2_decoder is to use Bz2Decoder::open, which handles memory mapping internally:
use Bz2Decoder;
use Read;
Decompressing from Memory
If you already have the data in memory (e.g., an Arc<[u8]> or Arc<Mmap>), you can use Bz2Decoder::new:
use Bz2Decoder;
use Read;
use Arc;
Performance
parallel_bzip2_decoder scales linearly with the number of available CPU cores. It is significantly faster than standard single-threaded decoders for large files.
Benchmarking and Profiling
This crate includes comprehensive benchmarks and profiling tools:
- Decode benchmarks: Test decompression with various file sizes (1MB, 10MB, 50MB)
- Scanner benchmarks: Measure block scanning performance
- End-to-end benchmarks: Test the full decompression pipeline
- CPU profiling: Generate flamegraphs to identify performance bottlenecks
- Memory profiling: Track memory usage and detect leaks
Running Benchmarks
# Run all benchmarks
# Run specific benchmark suite
Profiling
# CPU profiling with flamegraphs
# Memory profiling with valgrind
For detailed instructions, see BENCHMARKING.md.
API Stability
This crate follows semantic versioning. Breaking changes will only occur with major version updates.
License
MIT
Contributing
See the main repository's CONTRIBUTING.md for details on how to contribute.
Changelog
See CHANGELOG.md for a history of changes (when available).