Expand description
High-performance parallel bzip2 decompression library.
This library provides efficient parallel decompression of bzip2 files by processing multiple blocks concurrently. It achieves significant speedups on multi-core systems compared to sequential decompression.
§Features
- Parallel block decompression: Utilizes all available CPU cores
- Streaming API: Implements
std::io::Readfor easy integration - Memory-efficient: Uses bounded channels to limit memory usage
- Zero-copy where possible: Memory-mapped I/O for file access
- Full bzip2 format support: Handles both single-stream and multi-stream bzip2 files
- Error handling: Comprehensive error reporting with
anyhowintegration
§Architecture
The library uses a multi-stage pipeline:
- Scanning: Identifies block boundaries using parallel pattern matching
- Decompression: Processes blocks in parallel using Rayon
- Reordering: Ensures output maintains correct block order
§Quick Start
The easiest way to use this library is through the Bz2Decoder:
use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;
let mut decoder = Bz2Decoder::open("file.bz2").unwrap();
let mut data = Vec::new();
decoder.read_to_end(&mut data).unwrap();§Advanced Usage
For more control, you can use the lower-level functions:
use parallel_bzip2_decoder::{scan_blocks, decompress_block};
let compressed_data = std::fs::read("file.bz2").unwrap();
let block_receiver = scan_blocks(&compressed_data);
for (start_bit, end_bit) in block_receiver {
let decompressed = decompress_block(&compressed_data, start_bit, end_bit).unwrap();
// Process decompressed block...
}§Performance
Performance scales nearly linearly with the number of CPU cores. On an 8-core system, expect 6-7x speedup compared to single-threaded bzip2 decompression.
§Thread Safety
All public types are thread-safe. The library uses Rayon’s global thread pool by default, but creates dedicated pools where needed to avoid deadlocks.
§Error Handling
This crate uses anyhow for comprehensive error handling. Most functions return
Result<T, anyhow::Error> for easy error propagation using the ? operator.
§Memory Usage
The library is designed with memory efficiency in mind:
- Memory-mapped I/O for large files
- Bounded channels to prevent unbounded memory growth
- Buffer reuse in block processing
§Benchmarks
Run benchmarks with cargo bench to measure performance on your system.
Various benchmark suites test different aspects of performance:
- Decode benchmarks with various file sizes
- Scanner performance
- End-to-end pipeline performance
Re-exports§
pub use decoder::Bz2Decoder;pub use error::Bz2Error;pub use error::Result;pub use scanner::extract_bits;pub use scanner::MarkerType;pub use scanner::Scanner;
Modules§
- decoder
- Parallel bzip2 decoder with streaming output.
- error
- Error types for parallel bzip2 decompression.
- scanner
- High-performance parallel scanner for bzip2 block boundaries.
Constants§
- MAX_
BLOCK_ SIZE - Maximum allowed uncompressed size for a single bzip2 block (2MB). This protects against decompression bomb attacks.
Functions§
- decompress_
block - Decompresses a single bzip2 block and returns the decompressed data.
- decompress_
block_ into - Decompresses a single bzip2 block into provided buffers (zero-allocation).
- decompress_
file - Decompresses an entire bzip2 file into memory.
- parallel_
bzip2_ cat Deprecated - Decompresses an entire bzip2 file and returns the decompressed data.
- scan_
blocks - Scans bzip2 data for block boundaries and returns them via a channel.