seqex 0.1.0

Regex-like pattern matching for arbitrary sequences using predicate functions
Documentation
  • Coverage
  • 54.55%
    6 out of 11 items documented1 out of 1 items with examples
  • Size
  • Source code size: 112.5 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 3.04 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 16s Average build duration of successful builds.
  • all releases: 16s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Repository
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • dawsonbooth

seqex

CI crates.io license

Regex-like pattern matching for arbitrary sequences. Instead of matching characters against character classes, match elements of any type against predicate functions.

Built on an NFA engine (Thompson's construction) — O(n × m) guaranteed, no exponential backtracking.

Install

[dependencies]
seqex = "0.1"

Quick start

use seqex::Pattern;

let is_even = |n: &i32| *n % 2 == 0;
let is_odd = |n: &i32| *n % 2 != 0;

// Build a pattern and compile it
let matcher = Pattern::new(is_even)
    .followed_by(is_odd)
    .followed_by(is_even)
    .compile();

// Find all non-overlapping matches
let results = matcher.find_all(&[2, 3, 4, 6, 7, 8, 9, 10]);
// → [{ start: 0, end: 2, data: [2, 3, 4] },
//    { start: 3, end: 5, data: [6, 7, 8] }]

API

Building patterns

Start a pattern with Pattern::new() or Pattern::any(), then chain methods to describe the shape you're looking for.

Pattern::new(fn) — start with a predicate

Pattern::new(|n: &i32| *n > 0)

Pattern::<T>::any() — start with a wildcard (matches any element)

Pattern::<String>::any()

.followed_by(fn | pattern) — append a predicate or sub-pattern

Pattern::new(is_even).followed_by(is_odd).followed_by(is_even)

// Sub-patterns work too
let prefix = Pattern::new(is_even).followed_by(is_odd);
Pattern::new(is_positive).followed_by(prefix)

.followed_by_any() — append a wildcard

Pattern::new(is_even).followed_by_any().followed_by(is_odd)

Quantifiers

Quantifiers modify the last element in the pattern. Greedy variants (_greedy) accept an explicit boolean parameter.

Method Regex equivalent Description
.one_or_more() + One or more
.zero_or_more() * Zero or more
.optional() ? Zero or one
.times(n) {n} Exactly n
.between(min, max) {min,max} Between min and max
.at_least(min) {min,} Min to unbounded
// One or more even numbers followed by an odd
Pattern::new(is_even).one_or_more().followed_by(is_odd)

// Exactly 3 positive numbers
Pattern::new(|n: &i32| *n > 0).times(3)

// Between 2 and 5 elements
Pattern::new(is_even).between(2, 5)

Greedy vs lazy

By default, quantifiers are greedy (match as many elements as possible). Use the _greedy(false) variants for lazy matching (match as few as possible).

// Greedy: consumes as many positives as possible
Pattern::new(is_positive).one_or_more().followed_by(is_positive)
// On [1, 2, 3] → matches [1, 2, 3]

// Lazy: consumes as few positives as possible
Pattern::new(is_positive).one_or_more_greedy(false).followed_by(is_positive)
// On [1, 2, 3] → matches [1, 2]

Alternation

.or(fn | pattern) — match this pattern or another

// Match a positive or negative number
Pattern::new(|n: &i32| *n > 0).or(|n: &i32| *n < 0)

// Alternation with complex sub-patterns
Pattern::new(is_even).followed_by(is_odd)
    .or(Pattern::new(is_odd).followed_by(is_even))

Pattern::one_of(alternatives) — multi-way alternation

Cleaner syntax for 3+ branches. Accepts any mix of Pat::new(fn) predicates and Pattern instances (via .into()).

use seqex::Pat;

Pattern::one_of(vec![
    Pat::new(is_even),
    Pat::new(is_odd),
    Pat::new(is_zero),
])

// With sub-patterns
Pattern::one_of(vec![
    Pattern::new(is_even).followed_by(is_odd).into(),
    Pattern::new(is_odd).followed_by(is_even).into(),
    Pattern::new(is_zero).followed_by(is_zero).into(),
])

// Composable — chain quantifiers, followed_by, etc.
Pattern::one_of(vec![Pat::new(is_even), Pat::new(is_odd)])
    .one_or_more()
    .followed_by(is_zero)

Anchors

.at_start() — anchor to the beginning of the sequence

Pattern::new(is_even)
    .at_start()
    .compile()
    .find_all(&[2, 3, 4]) // → [{ start: 0, end: 0, data: [2] }]
    .find_all(&[1, 2, 4]) // → []

.at_end() — anchor to the end of the sequence

Pattern::new(is_even)
    .at_end()
    .compile()
    .find_all(&[1, 3, 4]) // → [{ start: 2, end: 2, data: [4] }]

Compiling and matching

.compile() — compile the pattern into a Matcher

Patterns are consumed on compilation (Rust move semantics). Call .compile() to get a Matcher that can be used repeatedly against different sequences.

let matcher = Pattern::new(is_even).compile();

matcher.find_all(slice) — find all non-overlapping matches

Returns a Vec<MatchResult<T>> with start, end, and data fields.

matcher.find_all(&[1, 2, 3, 4, 5, 6])
// → [{ start: 1, end: 1, data: [2] },
//    { start: 3, end: 3, data: [4] },
//    { start: 5, end: 5, data: [6] }]

matcher.find_all_iter(iter) — find all matches from an iterator

Accepts any IntoIterator<Item = T>. Useful for collections like BTreeSet, HashMap, or custom iterators.

use std::collections::BTreeSet;

let set: BTreeSet<i32> = [3, 1, -2, 5].into_iter().collect();
matcher.find_all_iter(set.iter().copied())

matcher.find(slice) / matcher.find_iter(iter) — find the first match

Returns Option<MatchResult<T>>. The iterator variant stops consuming elements as soon as a match is found.

matcher.find(&[1, 2, 3, 4]) // → Some({ start: 1, end: 1, data: [2] })
matcher.find(&[1, 3, 5])    // → None

matcher.test(slice) / matcher.test_iter(iter) — check if any match exists

Returns a bool.

matcher.test(&[1, 2, 3]) // → true
matcher.test(&[1, 3, 5]) // → false

Streaming

For data that arrives incrementally (event streams, network packets, sensor readings), use the push-based scanner API.

matcher.scanner() — create a streaming scanner

let mut scanner = matcher.scanner();

for event in event_source {
    for m in scanner.push(event) {
        handle_match(m); // matches emitted as soon as they become definitive
    }
}

// Signal end-of-stream to flush pending matches (greedy, at_end anchors)
for m in scanner.end() {
    handle_match(m);
}

push(element) advances the NFA simulation by one element and returns any matches that have become definitive. end() signals that no more elements will arrive, resolving pending greedy matches and at_end anchors.

For greedy patterns, matches are held until the greedy quantifier can no longer extend (i.e., the simulation dies). For lazy patterns, matches emit from push() as early as possible.

Works with any type

The library is generic over T: Clone + 'static — match numbers, strings, structs, enums, or anything else.

// Strings
let matcher = Pattern::new(|s: &String| s.starts_with('a'))
    .followed_by(|s: &String| s.len() > 3)
    .compile();

let seq = vec!["apple".into(), "banana".into(), "ant".into(), "elephant".into()];
matcher.find_all(&seq)
// → [{ start: 0, end: 1, data: ["apple", "banana"] },
//    { start: 2, end: 3, data: ["ant", "elephant"] }]

// Structs
#[derive(Clone)]
struct Event { kind: String, level: i32 }

let matcher = Pattern::<Event>::new(|e| e.kind == "error")
    .one_or_more()
    .followed_by(|e: &Event| e.kind == "recovery")
    .compile();

Advanced patterns

Each predicate sees a single element in isolation. For patterns that depend on relationships between elements, there are two approaches.

Pre-processing

Transform the sequence so each element carries the context it needs. This is pure and composable.

// Detect runs of 3+ strictly increasing numbers
let nums = vec![1, 3, 5, 2, 4, 8, 12, 7];
let pairs: Vec<(i32, i32)> = nums.windows(2)
    .map(|w| (w[0], w[1]))
    .collect();

let m = Pattern::new(|p: &(i32, i32)| p.1 > p.0)
    .at_least(2)
    .compile();

m.find_all(&pairs) // finds [1,3,5] and [2,4,8,12]

Closure variables

When a later predicate needs to reference what an earlier one matched, use shared mutable state. The NFA evaluates predicates left-to-right during simulation, so the ordering is reliable.

use std::cell::RefCell;
use std::rc::Rc;

let open_name = Rc::new(RefCell::new(String::new()));
let on1 = Rc::clone(&open_name);
let on2 = Rc::clone(&open_name);

let m = Pattern::<Tag>::new(move |t| {
    if t.kind == "open" {
        *on1.borrow_mut() = t.name.clone();
        true
    } else {
        false
    }
})
.followed_by(Pattern::<Tag>::new(|t| t.kind == "text").one_or_more())
.followed_by(move |t: &Tag| t.kind == "close" && t.name == *on2.borrow())
.compile();

Tests

cargo test

How it works

  1. The fluent Pattern builder constructs an AST (abstract syntax tree) of pattern nodes
  2. .compile() converts the AST into an NFA (nondeterministic finite automaton) using Thompson's construction
  3. The matching engine simulates the NFA using the standard Thompson algorithm — tracking all active states simultaneously
  4. Each element is tested against predicate transitions on active states to advance the simulation
  5. This gives O(n × m) time complexity where n is the sequence length and m is the pattern size — no pathological backtracking cases

For slices, the engine runs the full simulation in a tight loop. For iterators and the streaming scanner, the same NFA simulation is broken into per-element steps with buffer-and-replay for find_all semantics.