Expand description
Grep Lane Executor (Task 5)
Exact regex / substring search as a first-class retrieval lane, built on the trigram candidate index. The pipeline is:
regex ──► required-literal extraction ──► trigram conjunction
──► trigram posting intersection (candidate DocIds)
──► ∩ AllowedSet (filter pushdown BEFORE verification)
──► regex verification (linear-time, finite-automaton engine)
──► ranked hits OR candidate gate§Correctness over speed
Trigram pre-filtering is only ever used when the executor can prove the
extracted literals are mandatory (present in every possible match). For any
pattern it cannot prove this for — alternation, groups, character classes,
or no literal run of length ≥ 3 — it falls back to an explicit, bounded
full scan rather than risk a false negative. The full-scan path is capped by
max_scan; exceeding the cap is reported as
GrepError::DegeneratePattern instead of silently returning partial
results.
§Verification engine
Verification uses the regex crate, a finite-automaton engine with
guaranteed linear-time matching, so adversarial patterns cannot turn the
verify stage into a catastrophic-backtracking DoS.
§Two fusion modes
Grep produces a set, but RRF consumes ranked lists. Both shapes are supported:
GrepMode::Rankscores each hit by specificity-weighted, TF-saturated, length-pivoted relevance (BM25-flavored over the pattern’s literal terms) so it can plug into RRF as a third ranked lane without the short-document / common-term bias of raw match density.GrepMode::Gatereturns the matching documents as anAllowedSetto intersect into the other lanes (the “find the function that contains X” cascade), viaGrepResults::into_allowed_set.
Structs§
- Grep
Executor - The grep executor: plans and runs regex search over a
TrigramIndex. - GrepHit
- A single grep match.
- Grep
Results - The outcome of a grep search.
Enums§
- Grep
Error - Errors the grep lane can return.
- Grep
Mode - How grep results should be consumed by the fusion layer.
Constants§
- DEFAULT_
MAX_ SCAN - Default cap on documents verified by a degenerate (no-trigram) full scan.
Functions§
- required_
trigrams - Extract the mandatory trigram conjunction for
pattern, orNoneif the pattern is too complex to prove a mandatory literal (caller must full-scan).