Skip to main content

Module strmatch

Module strmatch 

Source
Available on crate feature strmatch only.
Expand description

Regex-shaped patterns, fast-path dispatch.

Operators write a regex (the most familiar pattern language). strmatch classifies it into one of four tiers and dispatches at match time via the cheapest engine that’s correct:

  • Byte (≤ 30 ns) – direct byte ops: memchr / memchr2 / memchr3 / single-byte starts_with / ends_with / ==.
  • Literal (≤ 200 ns) – single multi-byte literal: memmem::Finder / multi-byte starts_with / ends_with / ==.
  • LiteralSet (≤ 500 ns) – aho-corasick over ≥ 2 literals (optional uniform anchor checked after the AC scan). Regex engine is never invoked.
  • Regex (engine-bounded) – fall through to regex-automata::meta::Regex. Has its own internal prefilter pipeline; cost depends on pattern and haystack.

Budgets are typical for a modern x86 server on a ~200-byte haystack; see MatcherTier::typical_budget_ns and benches/strmatch.rs.

§Anti-spam discipline

When a pattern compiles to MatcherTier::Regex (the engine fall-back), strmatch emits one WARN per distinct pattern per process, capped at 10 distinct WARNs total. After the cap, further fall-through patterns log at DEBUG. A counter hyperi_strmatch_regex_fallback_total increments on every fall-through regardless of log level – operators can scrape that rather than rely on logs.

§Quality gates

Use StrMatcher::builder with StrMatcherBuilder::min_tier to reject (or loudly warn about) patterns that fall below an operator-chosen tier. Useful for hot-path configs where regex fall-through is unacceptable.

§Example

use hyperi_rustlib::strmatch::{MatcherTier, OnBelowMin, StrMatcher};

// Byte tier -- anchored single byte, dispatches to hay.first() == Some(b)
let m = StrMatcher::new(r"^/")?;
assert_eq!(m.tier(), MatcherTier::Byte);
assert!(m.is_match(b"/api/v1/orders"));

// Literal tier -- multi-byte literal, dispatches to memmem
let m = StrMatcher::new(r"AKIA")?;
assert_eq!(m.tier(), MatcherTier::Literal);
assert!(m.is_match(b"... AKIA1234 ..."));

// LiteralSet tier -- alternation, dispatches to AhoCorasick
let m = StrMatcher::new(r"AKIA|ghp_|sk_live_")?;
assert_eq!(m.tier(), MatcherTier::LiteralSet);
assert!(m.is_match(b"github token: ghp_abcdef"));

// Regex tier -- falls through to engine; refuse the build instead
let err = StrMatcher::builder()
    .min_tier(MatcherTier::LiteralSet)
    .on_below_min(OnBelowMin::Reject)
    .build(r"\w+@\w+")
    .unwrap_err();
assert!(err.to_string().contains("tier"));

Structs§

Match
Byte offsets of a match. End is exclusive: &hay[start..end] is the matched slice.
SetMatch
Like Match but also identifies which input pattern matched in a StrMatcherSet.
StrMatcher
Compiled pattern with tier-aware dispatch.
StrMatcherBuilder
Builder for StrMatcher. Carries minimum-tier policy and the case-insensitivity flag.
StrMatcherSet
Multi-pattern matcher.

Enums§

BuildError
Failure modes during construction.
MatcherTier
Which engine class a StrMatcher is dispatching to. Tiers are ordered by cost – Byte > Literal > LiteralSet > Regex (higher means faster). Use Self::rank for min_tier comparisons.
OnBelowMin
What to do when a pattern’s classification falls below the builder’s min_tier.