Skip to main content

Prefilter

Enum Prefilter 

Source
pub enum Prefilter {
    None,
    SingleByte {
        byte: u8,
        max_offset: usize,
    },
    TwoBytes {
        byte1: u8,
        byte2: u8,
        max_offset: usize,
    },
    ThreeBytes {
        byte1: u8,
        byte2: u8,
        byte3: u8,
        max_offset: usize,
    },
    MultiBytes {
        bytes: Vec<u8>,
        max_offset: usize,
    },
    Literal {
        needle: Vec<u8>,
    },
    LiteralWithOffset {
        needle: Vec<u8>,
        max_offset: usize,
    },
    TwoByteLiteral {
        byte1: u8,
        byte2: u8,
        max_offset: usize,
    },
    FuzzyLiteral {
        first_byte: u8,
        alt_bytes: Vec<u8>,
        max_edits: usize,
    },
    PigeonholePieces {
        pieces: Vec<Vec<u8>>,
        offsets: Vec<usize>,
        max_edits: usize,
    },
}
Expand description

A prefilter that can quickly find candidate start positions.

Variants§

§

None

No prefiltering - try every position.

§

SingleByte

Search for a single byte.

Fields

§byte: u8

The byte to search for.

§max_offset: usize

Maximum distance from found byte where match could start.

§

TwoBytes

Search for two possible bytes (for case-insensitive or fuzzy).

Fields

§byte1: u8

First byte to search for.

§byte2: u8

Second byte to search for.

§max_offset: usize

Maximum distance from found byte where match could start.

§

ThreeBytes

Search for three possible bytes.

Fields

§byte1: u8

First byte to search for.

§byte2: u8

Second byte to search for.

§byte3: u8

Third byte to search for.

§max_offset: usize

Maximum distance from found byte where match could start.

§

MultiBytes

Search for 4+ possible bytes (uses linear scan).

Fields

§bytes: Vec<u8>

Collection of bytes to search for.

§max_offset: usize

Maximum distance from found byte where match could start.

§

Literal

Search for an exact literal substring.

Fields

§needle: Vec<u8>

The byte sequence to search for.

§

LiteralWithOffset

Search for a literal prefix with offset (for fuzzy matching).

Fields

§needle: Vec<u8>

The byte sequence to search for.

§max_offset: usize

Maximum distance from found literal where match could start.

§

TwoByteLiteral

Fast search for exactly 2-byte literal (uses memchr + check). Much faster than memmem for 2-byte needles.

Fields

§byte1: u8

First byte of the literal.

§byte2: u8

Second byte of the literal.

§max_offset: usize

Maximum distance from found literal where match could start.

§

FuzzyLiteral

Search for a literal with fuzzy tolerance.

Fields

§first_byte: u8

First byte of the literal (or common variant).

§alt_bytes: Vec<u8>

Alternative first bytes (for substitutions).

§max_edits: usize

Maximum edit distance.

§

PigeonholePieces

Pigeonhole-based prefilter: for k edits, at least one of (k+1) pieces must match exactly. Much more selective than single-byte prefiltering for longer patterns.

Fields

§pieces: Vec<Vec<u8>>

The pattern pieces (at least one must match exactly).

§offsets: Vec<usize>

Offset of each piece within the original pattern.

§max_edits: usize

Maximum edit distance.

Implementations§

Source§

impl Prefilter

Source

pub fn exact(literal: &str) -> Self

Create a prefilter for an exact literal.

Source

pub fn fuzzy(literal: &str, max_edits: u8) -> Self

Create a prefilter for a fuzzy literal.

Strategy: Even with fuzzy matching, certain bytes from the pattern must appear in any valid match. We search for bytes from the first positions that could possibly match.

Source

pub fn fuzzy_rare( literal: &str, max_edits: u8, text_sample: Option<&[u8]>, ) -> Self

Create a prefilter for a fuzzy literal with rarity-based byte selection.

For patterns with small alphabets (like DNA), select the rarest bytes to minimize false positives. Falls back to streaming if all bytes are common.

Source

pub fn pigeonhole(literal: &str, max_edits: u8) -> Self

Create a pigeonhole-based prefilter for fuzzy matching.

For a pattern of length m with at most k edits, we split the pattern into (k+1) non-overlapping pieces. By the pigeonhole principle, at least one piece must match exactly in any valid fuzzy match.

This is much more selective than single-byte prefiltering for longer patterns. For example, "hello" with k=1 → pieces ["hel", "lo"], both 2-3 chars long. Finding "hel" or "lo" is much rarer than finding ‘h’ or ‘e’.

Source

pub fn case_insensitive(literal: &str) -> Self

Create a prefilter for case-insensitive matching.

Source

pub fn multi_fuzzy(literals: &[(&str, u8)], case_insensitive: bool) -> Self

Create a prefilter for multiple fuzzy literals.

Collects first bytes from all patterns and uses memchr to find candidates. Each entry is (literal, max_edits).

Source

pub fn find_candidates<'a>(&'a self, text: &'a [u8]) -> CandidateIter<'a>

Find candidate positions in the text. Returns an iterator over byte positions where a match might start.

Source

pub fn is_active(&self) -> bool

Check if this prefilter does anything useful.

Source

pub fn max_offset(&self) -> usize

Get the max_offset for this prefilter (how far back a match could start).

Source

pub fn selectivity(&self) -> usize

Check if this prefilter is “selective” enough to be effective. A prefilter searching for too many different bytes will hit too many positions. Returns the number of unique bytes being searched for (lower is better).

Source

pub fn is_selective(&self) -> bool

Check if this prefilter is likely to be effective for multi-pattern search. Returns false if the prefilter searches for too many different bytes.

Trait Implementations§

Source§

impl Clone for Prefilter

Source§

fn clone(&self) -> Prefilter

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Prefilter

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.