Crate parallel_bzip2_decoder

Crate parallel_bzip2_decoder 

Source
Expand description

High-performance parallel bzip2 decompression library.

This library provides efficient parallel decompression of bzip2 files by processing multiple blocks concurrently. It achieves significant speedups on multi-core systems compared to sequential decompression.

§Features

  • Parallel block decompression: Utilizes all available CPU cores
  • Streaming API: Implements std::io::Read for easy integration
  • Memory-efficient: Uses bounded channels to limit memory usage
  • Zero-copy where possible: Memory-mapped I/O for file access
  • Full bzip2 format support: Handles both single-stream and multi-stream bzip2 files
  • Error handling: Comprehensive error reporting with anyhow integration

§Architecture

The library uses a multi-stage pipeline:

  1. Scanning: Identifies block boundaries using parallel pattern matching
  2. Decompression: Processes blocks in parallel using Rayon
  3. Reordering: Ensures output maintains correct block order

§Quick Start

The easiest way to use this library is through the Bz2Decoder:

use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;

let mut decoder = Bz2Decoder::open("file.bz2").unwrap();
let mut data = Vec::new();
decoder.read_to_end(&mut data).unwrap();

§Advanced Usage

For more control, you can use the lower-level functions:

use parallel_bzip2_decoder::{scan_blocks, decompress_block};

let compressed_data = std::fs::read("file.bz2").unwrap();
let block_receiver = scan_blocks(&compressed_data);

for (start_bit, end_bit) in block_receiver {
    let decompressed = decompress_block(&compressed_data, start_bit, end_bit).unwrap();
    // Process decompressed block...
}

§Performance

Performance scales nearly linearly with the number of CPU cores. On an 8-core system, expect 6-7x speedup compared to single-threaded bzip2 decompression.

§Thread Safety

All public types are thread-safe. The library uses Rayon’s global thread pool by default, but creates dedicated pools where needed to avoid deadlocks.

§Error Handling

This crate uses anyhow for comprehensive error handling. Most functions return Result<T, anyhow::Error> for easy error propagation using the ? operator.

§Memory Usage

The library is designed with memory efficiency in mind:

  • Memory-mapped I/O for large files
  • Bounded channels to prevent unbounded memory growth
  • Buffer reuse in block processing

§Benchmarks

Run benchmarks with cargo bench to measure performance on your system. Various benchmark suites test different aspects of performance:

  • Decode benchmarks with various file sizes
  • Scanner performance
  • End-to-end pipeline performance

Re-exports§

pub use decoder::Bz2Decoder;
pub use scanner::extract_bits;
pub use scanner::MarkerType;
pub use scanner::Scanner;

Modules§

decoder
Parallel bzip2 decoder with streaming output.
scanner
High-performance parallel scanner for bzip2 block boundaries.

Functions§

decompress_block
Decompresses a single bzip2 block and returns the decompressed data.
decompress_block_into
Decompresses a single bzip2 block into provided buffers (zero-allocation).
parallel_bzip2_cat
Decompresses an entire bzip2 file and returns the decompressed data.
scan_blocks
Scans bzip2 data for block boundaries and returns them via a channel.