# parallel_bzip2_decoder
A high-performance, parallel bzip2 decoder for Rust.
This crate provides a `Bz2Decoder` that implements `std::io::Read`, allowing you to decompress bzip2 files in parallel using multiple CPU cores. It is designed to work efficiently with both single-stream (standard) and multi-stream (e.g., `pbzip2`) bzip2 files by scanning for block boundaries and decompressing them concurrently.
## Features
- **Parallel Decompression**: Utilizes `rayon` to decompress blocks in parallel.
- **Standard API**: Implements `std::io::Read` for easy integration.
- **Memory Mapped**: Efficiently handles large files using memory mapping.
- **Flexible**: Supports opening files directly or working with in-memory buffers (via `Arc`).
- **Full bzip2 format support**: Handles both single-stream and multi-stream bzip2 files
- **Error handling**: Comprehensive error reporting with `anyhow` integration
- **Memory efficient**: Bounded channels and buffer reuse to minimize memory usage
## Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
parallel_bzip2_decoder = "0.1"
```
### Decompressing a File
The easiest way to use `parallel_bzip2_decoder` is to use `Bz2Decoder::open`, which handles memory mapping internally:
```rust
use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;
fn main() -> anyhow::Result<()> {
let mut decoder = Bz2Decoder::open("input.bz2")?;
let mut buffer = Vec::new();
decoder.read_to_end(&mut buffer)?;
println!("Decompressed {} bytes", buffer.len());
Ok(())
}
```
### Decompressing from Memory
If you already have the data in memory (e.g., an `Arc<[u8]>` or `Arc<Mmap>`), you can use `Bz2Decoder::new`:
```rust
use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;
use std::sync::Arc;
fn main() -> anyhow::Result<()> {
let data: Vec<u8> = vec![/* ... bzip2 data ... */];
let data_arc = Arc::new(data);
let mut decoder = Bz2Decoder::new(data_arc);
let mut buffer = Vec::new();
decoder.read_to_end(&mut buffer)?;
Ok(())
}
```
## Performance
`parallel_bzip2_decoder` scales linearly with the number of available CPU cores. It is significantly faster than standard single-threaded decoders for large files.
## Benchmarking and Profiling
This crate includes comprehensive benchmarks and profiling tools:
- **Decode benchmarks**: Test decompression with various file sizes (1MB, 10MB, 50MB)
- **Scanner benchmarks**: Measure block scanning performance
- **End-to-end benchmarks**: Test the full decompression pipeline
- **CPU profiling**: Generate flamegraphs to identify performance bottlenecks
- **Memory profiling**: Track memory usage and detect leaks
### Running Benchmarks
```bash
# Run all benchmarks
cargo bench
# Run specific benchmark suite
cargo bench --bench decode_benchmark
cargo bench --bench scanner_benchmark
cargo bench --bench e2e_benchmark
```
### Profiling
```bash
# CPU profiling with flamegraphs
cd ../scripts
./profile_cpu.sh
# Memory profiling with valgrind
./profile_memory.sh
```
For detailed instructions, see [BENCHMARKING.md](../BENCHMARKING.md).
## API Stability
This crate follows semantic versioning. Breaking changes will only occur with major version updates.
## License
MIT
## Contributing
See the main repository's [CONTRIBUTING.md](../CONTRIBUTING.md) for details on how to contribute.
## Changelog
See [CHANGELOG.md](../CHANGELOG.md) for a history of changes (when available).