seqex 0.1.0

Regex-like pattern matching for arbitrary sequences using predicate functions
Documentation
# seqex

[![CI](https://github.com/dawsonbooth/seqex-rust/actions/workflows/ci.yml/badge.svg)](https://github.com/dawsonbooth/seqex-rust/actions/workflows/ci.yml)
[![crates.io](https://img.shields.io/crates/v/seqex)](https://crates.io/crates/seqex)
[![license](https://img.shields.io/crates/l/seqex)](LICENSE)

Regex-like pattern matching for arbitrary sequences. Instead of matching characters against character classes, match elements of any type against predicate functions.

Built on an NFA engine (Thompson's construction) — O(n × m) guaranteed, no exponential backtracking.

## Install

```toml
[dependencies]
seqex = "0.1"
```

## Quick start

```rust
use seqex::Pattern;

let is_even = |n: &i32| *n % 2 == 0;
let is_odd = |n: &i32| *n % 2 != 0;

// Build a pattern and compile it
let matcher = Pattern::new(is_even)
    .followed_by(is_odd)
    .followed_by(is_even)
    .compile();

// Find all non-overlapping matches
let results = matcher.find_all(&[2, 3, 4, 6, 7, 8, 9, 10]);
// → [{ start: 0, end: 2, data: [2, 3, 4] },
//    { start: 3, end: 5, data: [6, 7, 8] }]
```

## API

### Building patterns

Start a pattern with `Pattern::new()` or `Pattern::any()`, then chain methods to describe the shape you're looking for.

#### `Pattern::new(fn)` — start with a predicate

```rust
Pattern::new(|n: &i32| *n > 0)
```

#### `Pattern::<T>::any()` — start with a wildcard (matches any element)

```rust
Pattern::<String>::any()
```

#### `.followed_by(fn | pattern)` — append a predicate or sub-pattern

```rust
Pattern::new(is_even).followed_by(is_odd).followed_by(is_even)

// Sub-patterns work too
let prefix = Pattern::new(is_even).followed_by(is_odd);
Pattern::new(is_positive).followed_by(prefix)
```

#### `.followed_by_any()` — append a wildcard

```rust
Pattern::new(is_even).followed_by_any().followed_by(is_odd)
```

### Quantifiers

Quantifiers modify the **last element** in the pattern. Greedy variants (`_greedy`) accept an explicit boolean parameter.

| Method                        | Regex equivalent | Description         |
| ----------------------------- | ---------------- | ------------------- |
| `.one_or_more()`              | `+`              | One or more         |
| `.zero_or_more()`             | `*`              | Zero or more        |
| `.optional()`                 | `?`              | Zero or one         |
| `.times(n)`                   | `{n}`            | Exactly n           |
| `.between(min, max)`          | `{min,max}`      | Between min and max |
| `.at_least(min)`              | `{min,}`         | Min to unbounded    |

```rust
// One or more even numbers followed by an odd
Pattern::new(is_even).one_or_more().followed_by(is_odd)

// Exactly 3 positive numbers
Pattern::new(|n: &i32| *n > 0).times(3)

// Between 2 and 5 elements
Pattern::new(is_even).between(2, 5)
```

#### Greedy vs lazy

By default, quantifiers are greedy (match as many elements as possible). Use the `_greedy(false)` variants for lazy matching (match as few as possible).

```rust
// Greedy: consumes as many positives as possible
Pattern::new(is_positive).one_or_more().followed_by(is_positive)
// On [1, 2, 3] → matches [1, 2, 3]

// Lazy: consumes as few positives as possible
Pattern::new(is_positive).one_or_more_greedy(false).followed_by(is_positive)
// On [1, 2, 3] → matches [1, 2]
```

### Alternation

#### `.or(fn | pattern)` — match this pattern or another

```rust
// Match a positive or negative number
Pattern::new(|n: &i32| *n > 0).or(|n: &i32| *n < 0)

// Alternation with complex sub-patterns
Pattern::new(is_even).followed_by(is_odd)
    .or(Pattern::new(is_odd).followed_by(is_even))
```

#### `Pattern::one_of(alternatives)` — multi-way alternation

Cleaner syntax for 3+ branches. Accepts any mix of `Pat::new(fn)` predicates and `Pattern` instances (via `.into()`).

```rust
use seqex::Pat;

Pattern::one_of(vec![
    Pat::new(is_even),
    Pat::new(is_odd),
    Pat::new(is_zero),
])

// With sub-patterns
Pattern::one_of(vec![
    Pattern::new(is_even).followed_by(is_odd).into(),
    Pattern::new(is_odd).followed_by(is_even).into(),
    Pattern::new(is_zero).followed_by(is_zero).into(),
])

// Composable — chain quantifiers, followed_by, etc.
Pattern::one_of(vec![Pat::new(is_even), Pat::new(is_odd)])
    .one_or_more()
    .followed_by(is_zero)
```

### Anchors

#### `.at_start()` — anchor to the beginning of the sequence

```rust
Pattern::new(is_even)
    .at_start()
    .compile()
    .find_all(&[2, 3, 4]) // → [{ start: 0, end: 0, data: [2] }]
    .find_all(&[1, 2, 4]) // → []
```

#### `.at_end()` — anchor to the end of the sequence

```rust
Pattern::new(is_even)
    .at_end()
    .compile()
    .find_all(&[1, 3, 4]) // → [{ start: 2, end: 2, data: [4] }]
```

### Compiling and matching

#### `.compile()` — compile the pattern into a `Matcher`

Patterns are consumed on compilation (Rust move semantics). Call `.compile()` to get a `Matcher` that can be used repeatedly against different sequences.

```rust
let matcher = Pattern::new(is_even).compile();
```

#### `matcher.find_all(slice)` — find all non-overlapping matches

Returns a `Vec<MatchResult<T>>` with `start`, `end`, and `data` fields.

```rust
matcher.find_all(&[1, 2, 3, 4, 5, 6])
// → [{ start: 1, end: 1, data: [2] },
//    { start: 3, end: 3, data: [4] },
//    { start: 5, end: 5, data: [6] }]
```

#### `matcher.find_all_iter(iter)` — find all matches from an iterator

Accepts any `IntoIterator<Item = T>`. Useful for collections like `BTreeSet`, `HashMap`, or custom iterators.

```rust
use std::collections::BTreeSet;

let set: BTreeSet<i32> = [3, 1, -2, 5].into_iter().collect();
matcher.find_all_iter(set.iter().copied())
```

#### `matcher.find(slice)` / `matcher.find_iter(iter)` — find the first match

Returns `Option<MatchResult<T>>`. The iterator variant stops consuming elements as soon as a match is found.

```rust
matcher.find(&[1, 2, 3, 4]) // → Some({ start: 1, end: 1, data: [2] })
matcher.find(&[1, 3, 5])    // → None
```

#### `matcher.test(slice)` / `matcher.test_iter(iter)` — check if any match exists

Returns a `bool`.

```rust
matcher.test(&[1, 2, 3]) // → true
matcher.test(&[1, 3, 5]) // → false
```

### Streaming

For data that arrives incrementally (event streams, network packets, sensor readings), use the push-based scanner API.

#### `matcher.scanner()` — create a streaming scanner

```rust
let mut scanner = matcher.scanner();

for event in event_source {
    for m in scanner.push(event) {
        handle_match(m); // matches emitted as soon as they become definitive
    }
}

// Signal end-of-stream to flush pending matches (greedy, at_end anchors)
for m in scanner.end() {
    handle_match(m);
}
```

`push(element)` advances the NFA simulation by one element and returns any matches that have become definitive. `end()` signals that no more elements will arrive, resolving pending greedy matches and `at_end` anchors.

For greedy patterns, matches are held until the greedy quantifier can no longer extend (i.e., the simulation dies). For lazy patterns, matches emit from `push()` as early as possible.

## Works with any type

The library is generic over `T: Clone + 'static` — match numbers, strings, structs, enums, or anything else.

```rust
// Strings
let matcher = Pattern::new(|s: &String| s.starts_with('a'))
    .followed_by(|s: &String| s.len() > 3)
    .compile();

let seq = vec!["apple".into(), "banana".into(), "ant".into(), "elephant".into()];
matcher.find_all(&seq)
// → [{ start: 0, end: 1, data: ["apple", "banana"] },
//    { start: 2, end: 3, data: ["ant", "elephant"] }]

// Structs
#[derive(Clone)]
struct Event { kind: String, level: i32 }

let matcher = Pattern::<Event>::new(|e| e.kind == "error")
    .one_or_more()
    .followed_by(|e: &Event| e.kind == "recovery")
    .compile();
```

## Advanced patterns

Each predicate sees a single element in isolation. For patterns that depend on relationships between elements, there are two approaches.

### Pre-processing

Transform the sequence so each element carries the context it needs. This is pure and composable.

```rust
// Detect runs of 3+ strictly increasing numbers
let nums = vec![1, 3, 5, 2, 4, 8, 12, 7];
let pairs: Vec<(i32, i32)> = nums.windows(2)
    .map(|w| (w[0], w[1]))
    .collect();

let m = Pattern::new(|p: &(i32, i32)| p.1 > p.0)
    .at_least(2)
    .compile();

m.find_all(&pairs) // finds [1,3,5] and [2,4,8,12]
```

### Closure variables

When a later predicate needs to reference what an earlier one matched, use shared mutable state. The NFA evaluates predicates left-to-right during simulation, so the ordering is reliable.

```rust
use std::cell::RefCell;
use std::rc::Rc;

let open_name = Rc::new(RefCell::new(String::new()));
let on1 = Rc::clone(&open_name);
let on2 = Rc::clone(&open_name);

let m = Pattern::<Tag>::new(move |t| {
    if t.kind == "open" {
        *on1.borrow_mut() = t.name.clone();
        true
    } else {
        false
    }
})
.followed_by(Pattern::<Tag>::new(|t| t.kind == "text").one_or_more())
.followed_by(move |t: &Tag| t.kind == "close" && t.name == *on2.borrow())
.compile();
```

## Tests

```bash
cargo test
```

## How it works

1. The fluent `Pattern` builder constructs an AST (abstract syntax tree) of pattern nodes
2. `.compile()` converts the AST into an NFA (nondeterministic finite automaton) using Thompson's construction
3. The matching engine simulates the NFA using the standard Thompson algorithm — tracking all active states simultaneously
4. Each element is tested against predicate transitions on active states to advance the simulation
5. This gives O(n × m) time complexity where n is the sequence length and m is the pattern size — no pathological backtracking cases

For slices, the engine runs the full simulation in a tight loop. For iterators and the streaming scanner, the same NFA simulation is broken into per-element steps with buffer-and-replay for `find_all` semantics.