# seqex
[](https://github.com/dawsonbooth/seqex-rust/actions/workflows/ci.yml)
[](https://crates.io/crates/seqex)
[](LICENSE)
Regex-like pattern matching for arbitrary sequences. Instead of matching characters against character classes, match elements of any type against predicate functions.
Built on an NFA engine (Thompson's construction) — O(n × m) guaranteed, no exponential backtracking.
## Install
```toml
[dependencies]
seqex = "0.1"
```
## Quick start
```rust
use seqex::Pattern;
let is_even = |n: &i32| *n % 2 == 0;
let is_odd = |n: &i32| *n % 2 != 0;
// Build a pattern and compile it
let matcher = Pattern::new(is_even)
.followed_by(is_odd)
.followed_by(is_even)
.compile();
// Find all non-overlapping matches
let results = matcher.find_all(&[2, 3, 4, 6, 7, 8, 9, 10]);
// → [{ start: 0, end: 2, data: [2, 3, 4] },
// { start: 3, end: 5, data: [6, 7, 8] }]
```
## API
### Building patterns
Start a pattern with `Pattern::new()` or `Pattern::any()`, then chain methods to describe the shape you're looking for.
#### `Pattern::new(fn)` — start with a predicate
```rust
```rust
Pattern::new(is_even).followed_by(is_odd).followed_by(is_even)
// Sub-patterns work too
let prefix = Pattern::new(is_even).followed_by(is_odd);
Pattern::new(is_positive).followed_by(prefix)
```
#### `.followed_by_any()` — append a wildcard
```rust
Pattern::new(is_even).followed_by_any().followed_by(is_odd)
```
### Quantifiers
Quantifiers modify the **last element** in the pattern. Greedy variants (`_greedy`) accept an explicit boolean parameter.
| `.one_or_more()` | `+` | One or more |
| `.zero_or_more()` | `*` | Zero or more |
| `.optional()` | `?` | Zero or one |
| `.times(n)` | `{n}` | Exactly n |
| `.between(min, max)` | `{min,max}` | Between min and max |
| `.at_least(min)` | `{min,}` | Min to unbounded |
```rust
// One or more even numbers followed by an odd
Pattern::new(is_even).one_or_more().followed_by(is_odd)
// Exactly 3 positive numbers
Pattern::new(is_even).between(2, 5)
```
#### Greedy vs lazy
By default, quantifiers are greedy (match as many elements as possible). Use the `_greedy(false)` variants for lazy matching (match as few as possible).
```rust
// Greedy: consumes as many positives as possible
Pattern::new(is_positive).one_or_more().followed_by(is_positive)
// On [1, 2, 3] → matches [1, 2, 3]
// Lazy: consumes as few positives as possible
Pattern::new(is_positive).one_or_more_greedy(false).followed_by(is_positive)
// On [1, 2, 3] → matches [1, 2]
```
### Alternation
#### `.or(fn | pattern)` — match this pattern or another
```rust
// Match a positive or negative number
Pattern::new(is_even).followed_by(is_odd)
.or(Pattern::new(is_odd).followed_by(is_even))
```
#### `Pattern::one_of(alternatives)` — multi-way alternation
Cleaner syntax for 3+ branches. Accepts any mix of `Pat::new(fn)` predicates and `Pattern` instances (via `.into()`).
```rust
use seqex::Pat;
Pattern::one_of(vec![
Pat::new(is_even),
Pat::new(is_odd),
Pat::new(is_zero),
])
// With sub-patterns
Pattern::one_of(vec![
Pattern::new(is_even).followed_by(is_odd).into(),
Pattern::new(is_odd).followed_by(is_even).into(),
Pattern::new(is_zero).followed_by(is_zero).into(),
])
// Composable — chain quantifiers, followed_by, etc.
Pattern::one_of(vec![Pat::new(is_even), Pat::new(is_odd)])
.one_or_more()
.followed_by(is_zero)
```
### Anchors
#### `.at_start()` — anchor to the beginning of the sequence
```rust
Pattern::new(is_even)
.at_start()
.compile()
.find_all(&[2, 3, 4]) // → [{ start: 0, end: 0, data: [2] }]
.find_all(&[1, 2, 4]) // → []
```
#### `.at_end()` — anchor to the end of the sequence
```rust
Pattern::new(is_even)
.at_end()
.compile()
.find_all(&[1, 3, 4]) // → [{ start: 2, end: 2, data: [4] }]
```
### Compiling and matching
#### `.compile()` — compile the pattern into a `Matcher`
Patterns are consumed on compilation (Rust move semantics). Call `.compile()` to get a `Matcher` that can be used repeatedly against different sequences.
```rust
let matcher = Pattern::new(is_even).compile();
```
#### `matcher.find_all(slice)` — find all non-overlapping matches
Returns a `Vec<MatchResult<T>>` with `start`, `end`, and `data` fields.
```rust
matcher.find_all(&[1, 2, 3, 4, 5, 6])
// → [{ start: 1, end: 1, data: [2] },
// { start: 3, end: 3, data: [4] },
// { start: 5, end: 5, data: [6] }]
```
#### `matcher.find_all_iter(iter)` — find all matches from an iterator
Accepts any `IntoIterator<Item = T>`. Useful for collections like `BTreeSet`, `HashMap`, or custom iterators.
```rust
use std::collections::BTreeSet;
let set: BTreeSet<i32> = [3, 1, -2, 5].into_iter().collect();
matcher.find_all_iter(set.iter().copied())
```
#### `matcher.find(slice)` / `matcher.find_iter(iter)` — find the first match
Returns `Option<MatchResult<T>>`. The iterator variant stops consuming elements as soon as a match is found.
```rust
matcher.find(&[1, 2, 3, 4]) // → Some({ start: 1, end: 1, data: [2] })
matcher.find(&[1, 3, 5]) // → None
```
#### `matcher.test(slice)` / `matcher.test_iter(iter)` — check if any match exists
Returns a `bool`.
```rust
matcher.test(&[1, 2, 3]) // → true
matcher.test(&[1, 3, 5]) // → false
```
### Streaming
For data that arrives incrementally (event streams, network packets, sensor readings), use the push-based scanner API.
#### `matcher.scanner()` — create a streaming scanner
```rust
let mut scanner = matcher.scanner();
for event in event_source {
for m in scanner.push(event) {
handle_match(m); // matches emitted as soon as they become definitive
}
}
// Signal end-of-stream to flush pending matches (greedy, at_end anchors)
for m in scanner.end() {
handle_match(m);
}
```
`push(element)` advances the NFA simulation by one element and returns any matches that have become definitive. `end()` signals that no more elements will arrive, resolving pending greedy matches and `at_end` anchors.
For greedy patterns, matches are held until the greedy quantifier can no longer extend (i.e., the simulation dies). For lazy patterns, matches emit from `push()` as early as possible.
## Works with any type
The library is generic over `T: Clone + 'static` — match numbers, strings, structs, enums, or anything else.
```rust
// Strings
.compile();
let seq = vec!["apple".into(), "banana".into(), "ant".into(), "elephant".into()];
matcher.find_all(&seq)
// → [{ start: 0, end: 1, data: ["apple", "banana"] },
// { start: 2, end: 3, data: ["ant", "elephant"] }]
// Structs
#[derive(Clone)]
struct Event { kind: String, level: i32 }
.followed_by(|e: &Event| e.kind == "recovery")
.compile();
```
## Advanced patterns
Each predicate sees a single element in isolation. For patterns that depend on relationships between elements, there are two approaches.
### Pre-processing
Transform the sequence so each element carries the context it needs. This is pure and composable.
```rust
// Detect runs of 3+ strictly increasing numbers
let nums = vec![1, 3, 5, 2, 4, 8, 12, 7];
let pairs: Vec<(i32, i32)> = nums.windows(2)
.map(|w| (w[0], w[1]))
.collect();
.compile();
m.find_all(&pairs) // finds [1,3,5] and [2,4,8,12]
```
### Closure variables
When a later predicate needs to reference what an earlier one matched, use shared mutable state. The NFA evaluates predicates left-to-right during simulation, so the ordering is reliable.
```rust
use std::cell::RefCell;
use std::rc::Rc;
let open_name = Rc::new(RefCell::new(String::new()));
let on1 = Rc::clone(&open_name);
let on2 = Rc::clone(&open_name);
*on1.borrow_mut() = t.name.clone();
true
} else {
false
}
})
.followed_by(Pattern::<Tag>::new(|t| t.kind == "text").one_or_more())
.followed_by(move |t: &Tag| t.kind == "close" && t.name == *on2.borrow())
.compile();
```
## Tests
```bash
cargo test
```
## How it works
1. The fluent `Pattern` builder constructs an AST (abstract syntax tree) of pattern nodes
2. `.compile()` converts the AST into an NFA (nondeterministic finite automaton) using Thompson's construction
3. The matching engine simulates the NFA using the standard Thompson algorithm — tracking all active states simultaneously
4. Each element is tested against predicate transitions on active states to advance the simulation
5. This gives O(n × m) time complexity where n is the sequence length and m is the pattern size — no pathological backtracking cases
For slices, the engine runs the full simulation in a tight loop. For iterators and the streaming scanner, the same NFA simulation is broken into per-element steps with buffer-and-replay for `find_all` semantics.