Skip to main content

Crate parallel_bzip2_decoder

Crate parallel_bzip2_decoder 

Source
Expand description

High-performance parallel bzip2 decompression library.

This library provides efficient parallel decompression of bzip2 files by processing multiple blocks concurrently. It achieves significant speedups on multi-core systems compared to sequential decompression.

§Features

  • Parallel block decompression: Utilizes all available CPU cores
  • Streaming API: Implements std::io::Read for easy integration
  • Memory-efficient: Uses bounded channels to limit memory usage
  • Zero-copy where possible: Memory-mapped I/O for file access
  • Full bzip2 format support: Handles both single-stream and multi-stream bzip2 files
  • Error handling: Comprehensive error reporting with anyhow integration

§Architecture

The library uses a multi-stage pipeline:

  1. Scanning: Identifies block boundaries using parallel pattern matching
  2. Decompression: Processes blocks in parallel using Rayon
  3. Reordering: Ensures output maintains correct block order

§Quick Start

The easiest way to use this library is through the Bz2Decoder:

use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;

let mut decoder = Bz2Decoder::open("file.bz2").unwrap();
let mut data = Vec::new();
decoder.read_to_end(&mut data).unwrap();

§Advanced Usage

For more control, you can use the lower-level functions:

use parallel_bzip2_decoder::{scan_blocks, decompress_block};

let compressed_data = std::fs::read("file.bz2").unwrap();
let block_receiver = scan_blocks(&compressed_data);

for (start_bit, end_bit) in block_receiver {
    let decompressed = decompress_block(&compressed_data, start_bit, end_bit).unwrap();
    // Process decompressed block...
}

§Performance

Performance scales nearly linearly with the number of CPU cores. On an 8-core system, expect 6-7x speedup compared to single-threaded bzip2 decompression.

§Thread Safety

All public types are thread-safe. The library uses Rayon’s global thread pool by default, but creates dedicated pools where needed to avoid deadlocks.

§Error Handling

This crate uses anyhow for comprehensive error handling. Most functions return Result<T, anyhow::Error> for easy error propagation using the ? operator.

§Memory Usage

The library is designed with memory efficiency in mind:

  • Memory-mapped I/O for large files
  • Bounded channels to prevent unbounded memory growth
  • Buffer reuse in block processing

§Benchmarks

Run benchmarks with cargo bench to measure performance on your system. Various benchmark suites test different aspects of performance:

  • Decode benchmarks with various file sizes
  • Scanner performance
  • End-to-end pipeline performance

Re-exports§

pub use decoder::Bz2Decoder;
pub use error::Bz2Error;
pub use error::Result;
pub use scanner::extract_bits;
pub use scanner::MarkerType;
pub use scanner::Scanner;

Modules§

decoder
Parallel bzip2 decoder with streaming output.
error
Error types for parallel bzip2 decompression.
scanner
High-performance parallel scanner for bzip2 block boundaries.

Constants§

MAX_BLOCK_SIZE
Maximum allowed uncompressed size for a single bzip2 block (2MB). This protects against decompression bomb attacks.

Functions§

decompress_block
Decompresses a single bzip2 block and returns the decompressed data.
decompress_block_into
Decompresses a single bzip2 block into provided buffers (zero-allocation).
decompress_file
Decompresses an entire bzip2 file into memory.
parallel_bzip2_catDeprecated
Decompresses an entire bzip2 file and returns the decompressed data.
scan_blocks
Scans bzip2 data for block boundaries and returns them via a channel.