Expand description
High-performance parallel scanner for bzip2 block boundaries.
This module provides efficient scanning of bzip2 compressed data to locate block and end-of-stream markers. The scanner uses the Aho-Corasick algorithm for fast pattern matching and processes data in parallel chunks for maximum throughput.
§Algorithm
The scanner searches for two 48-bit magic numbers defined in the bzip2 specification:
- Block marker: 0x314159265359 (π in base 16)
- End-of-stream marker: 0x177245385090 (√π in base 16)
Since these markers can appear at any bit offset (not just byte boundaries), the scanner generates 8 shifted patterns for each magic number and uses Aho-Corasick for efficient multi-pattern matching. Candidates are then verified by extracting and comparing the full 48-bit value.
§Performance
- Parallel processing using Rayon for multi-core utilization
- 1MB chunks for optimal cache locality
- Aho-Corasick automaton for O(n) pattern matching
- Minimal memory allocation through buffer reuse
Structs§
- Scanner
- Parallel scanner for bzip2 block boundaries.
Enums§
- Marker
Type - Marker type found in bzip2 streams.
Functions§
- extract_
bits - Extracts a range of bits from a byte slice and appends them to the output buffer.