Skip to main content

Module scanner

Module scanner 

Source
Expand description

High-performance parallel scanner for bzip2 block boundaries.

This module provides efficient scanning of bzip2 compressed data to locate block and end-of-stream markers. The scanner uses the Aho-Corasick algorithm for fast pattern matching and processes data in parallel chunks for maximum throughput.

§Algorithm

The scanner searches for two 48-bit magic numbers defined in the bzip2 specification:

  • Block marker: 0x314159265359 (π in base 16)
  • End-of-stream marker: 0x177245385090 (√π in base 16)

Since these markers can appear at any bit offset (not just byte boundaries), the scanner generates 8 shifted patterns for each magic number and uses Aho-Corasick for efficient multi-pattern matching. Candidates are then verified by extracting and comparing the full 48-bit value.

§Performance

  • Parallel processing using Rayon for multi-core utilization
  • 1MB chunks for optimal cache locality
  • Aho-Corasick automaton for O(n) pattern matching
  • Minimal memory allocation through buffer reuse

Structs§

Scanner
Parallel scanner for bzip2 block boundaries.

Enums§

MarkerType
Marker type found in bzip2 streams.

Functions§

extract_bits
Extracts a range of bits from a byte slice and appends them to the output buffer.