Skip to main content

MatchFinder

Struct MatchFinder 

Source
pub struct MatchFinder { /* private fields */ }
Expand description

Optimized hash chain-based match finder.

Uses a fixed-size hash table with direct indexing for O(1) lookup. SIMD-accelerated match length comparison when available. Hash table is cache-line aligned (64 bytes) for optimal memory access.

§Novel Optimization: Generation Counter

Instead of zeroing the 256KB hash table on every reset (O(n)), we use a generation counter. Hash entries store (generation << 28) | (pos + 1). On reset, we just increment generation (O(1)). Entries from old generations are treated as empty.

§Novel Optimization: Match Prediction

When a match is found with offset O, the next occurrence of that pattern is very likely to be at position P + O (for repeating patterns like text). We track the last successful offset and check it first at the next position.

Implementations§

Source§

impl MatchFinder

Source

pub fn new(search_depth: usize) -> Self

Create a new match finder.

Source

pub fn early_exit_threshold(&self, position: usize) -> usize

Calculate early exit threshold based on position in file.

Early in the file, we want longer matches before exiting search (more exploration). Later in the file, shorter matches are acceptable for early exit (faster throughput).

The rationale:

  • Position 0-1KB: Still building hash chains, need full search (threshold=48)
  • Position 1-8KB: Chains are populated, moderate search (threshold=32)
  • Position 8-32KB: Well-populated chains, balanced (threshold=24)
  • Position 32KB+: Mature chains, early exit is safe (threshold=16)

This optimization is particularly effective for repetitive data where excellent matches are common.

Source

pub fn effective_depth(&self, input_len: usize) -> usize

Calculate effective search depth based on input size.

Larger inputs benefit from reduced search depth:

  • Reduces time spent in hash chain traversal
  • Cache pressure is higher with large data
  • Prediction often finds good matches anyway

The scaling is designed to maintain compression quality while improving throughput on large inputs.

Source

pub fn find_matches(&mut self, input: &[u8]) -> Vec<Match>

Find all matches in the input data.

Returns matches sorted by position.

Source

pub fn match_length_from(&self, input: &[u8], pos1: usize, pos2: usize) -> usize

Calculate match length starting from given offsets.

Used when we already know the first N bytes match (e.g., from prefix comparison). This avoids re-comparing bytes we’ve already verified.

Source

pub fn find_matches_speculative(&mut self, input: &[u8]) -> Vec<Match>

Find matches using speculative parallel lookahead.

This method speculatively computes hashes for multiple positions at once, exploiting instruction-level parallelism in modern CPUs. The key insight is that hash computation and match lookup are independent operations that can be pipelined.

Algorithm:

  1. Compute hashes for next LOOKAHEAD positions in parallel
  2. Look up potential matches for all positions
  3. Select the best match among candidates
  4. Skip to end of selected match

Benefits:

  • Better instruction-level parallelism (ILP)
  • May find better matches by considering multiple positions
  • Reduces branch mispredictions by batching work

Expected impact: +15-25% throughput on large data with varied content.

Trait Implementations§

Source§

impl Debug for MatchFinder

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.