rexile 0.4.9

A blazing-fast regex engine with 10-100x faster compilation and competitive matching performance - now with range quantifiers {n,m}, case-insensitive (?i), and full production-ready features
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
# ReXile ๐ŸฆŽ

[![Crates.io](https://img.shields.io/crates/v/rexile.svg)](https://crates.io/crates/rexile)
[![Documentation](https://docs.rs/rexile/badge.svg)](https://docs.rs/rexile)
[![License: MIT OR Apache-2.0](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg)](LICENSE)

**A blazing-fast regex engine with 10-100x faster compilation speed**

ReXile is a **lightweight regex alternative** that achieves **exceptional compilation speed** while maintaining competitive matching performance:

- โšก **10-100x faster compilation** - Load patterns instantly
- ๐Ÿš€ **Competitive matching** - 1.4-1.9x faster on simple patterns
- ๐ŸŽฏ **Dot wildcard support** - Full `.`, `.*`, `.+` implementation with backtracking
- ๐Ÿ“ฆ **Only 2 dependencies** - `memchr` and `aho-corasick` for SIMD primitives
- ๐Ÿง  **Smart backtracking** - Handles complex patterns with quantifiers
- ๐Ÿ”ง **Perfect for parsers** - Ideal for GRL, DSL, and rule engines

**Key Features:**
- โœ… Literal searches with SIMD acceleration
- โœ… Multi-pattern matching (alternations)
- โœ… Character classes with negation
- โœ… Quantifiers (`*`, `+`, `?`, `{n}`, `{n,m}`)
- โœ… **Range quantifiers** (`{n}`, `{n,}`, `{n,m}`)
- โœ… **Non-greedy quantifiers** (`*?`, `+?`, `??`)
- โœ… **Case-insensitive flag** (`(?i)`)
- โœ… **Dot wildcard** (`.`, `.*`, `.+`) with backtracking
- โœ… **DOTALL mode** (`(?s)`) - Dot matches newlines
- โœ… **Non-capturing groups** (`(?:...)`) with alternations
- โœ… **Hybrid DFA/NFA engine** - Smart pattern routing - NEW in v0.4.9
- โœ… Escape sequences (`\d`, `\w`, `\s`, etc.)
- โœ… Sequences and groups
- โœ… Word boundaries (`\b`, `\B`)
- โœ… Anchoring (`^`, `$`)
- โœ… **Capturing groups** - Auto-detection and extraction

## ๐ŸŽฏ Purpose

ReXile is a **high-performance regex engine** optimized for **fast compilation**:

- ๐Ÿš€ **Lightning-fast compilation** - 10-100x faster than `regex` crate
- โšก **Competitive matching** - Faster on simple patterns, acceptable on complex
- ๐ŸŽฏ **Ideal for parsers** - GRL, DSL, rule engines with dynamic patterns
- ๐Ÿ“ฆ **Minimal dependencies** - Only `memchr` + `aho-corasick` for SIMD primitives
-  **Memory efficient** - 15x less compilation memory
- ๐Ÿ”ง **Full control** - Custom optimizations for specific use cases

### Performance Highlights

**Compilation Speed** (vs regex crate):
- Pattern `[a-zA-Z_]\w*`: **104.7x faster** ๐Ÿš€
- Pattern `\d+`: **46.5x faster** ๐Ÿš€
- Pattern `(\w+)\s*(>=|<=|==|!=|>|<)\s*(.+)`: **40.7x faster** ๐Ÿš€
- Pattern `.*test.*`: **15.3x faster**
- **Average: 10-100x faster compilation**

**Matching Speed**:
- Simple patterns (`\d+`, `\w+`): **1.4-1.9x faster** โœ…
- Complex patterns with backtracking: 2-10x slower (acceptable for non-hot-path)
- **Perfect trade-off for parsers and rule engines**

**Use Case Example** (Load 1000 GRL rules):
- regex crate: ~2 seconds compilation
- rexile: ~0.02 seconds (**100x faster startup!**)

**Memory Comparison**:
- Compilation: **15x less memory** (128 KB vs 1920 KB)
- Peak memory: **5x less** in stress tests (0.12 MB vs 0.62 MB)
- Search operations: **Equal memory efficiency**

**When to Use ReXile:**
- โœ… Parsers & lexers (fast token matching + instant startup)
- โœ… Rule engines with dynamic patterns (100x faster rule loading)
- โœ… DSL compilers (GRL, business rules)
- โœ… Applications with many patterns (instant initialization)
- โœ… Memory-constrained environments (15x less memory)
- โœ… Non-hot-path matching (acceptable trade-off for 100x faster compilation)

## ๐Ÿš€ Quick Start

```rust
use rexile::Pattern;

// Literal matching with SIMD acceleration
let pattern = Pattern::new("hello").unwrap();
assert!(pattern.is_match("hello world"));
assert_eq!(pattern.find("say hello"), Some((4, 9)));

// Multi-pattern matching (aho-corasick fast path)
let multi = Pattern::new("foo|bar|baz").unwrap();
assert!(multi.is_match("the bar is open"));

// Dot wildcard matching (with backtracking)
let dot = Pattern::new("a.c").unwrap();
assert!(dot.is_match("abc"));  // . matches 'b'
assert!(dot.is_match("a_c"));  // . matches '_'

// Greedy quantifiers with dot
let greedy = Pattern::new("a.*c").unwrap();
assert!(greedy.is_match("abc"));       // .* matches 'b'
assert!(greedy.is_match("a12345c"));   // .* matches '12345'

let plus = Pattern::new("a.+c").unwrap();
assert!(plus.is_match("abc"));         // .+ matches 'b' (requires at least one char)
assert!(!plus.is_match("ac"));         // .+ needs at least 1 character

// Non-greedy quantifiers (NEW in v0.2.1)
let lazy = Pattern::new(r"start\{.*?\}").unwrap();
assert_eq!(lazy.find("start{abc}end{xyz}"), Some((0, 10))); // Matches "start{abc}", not greedy

// DOTALL mode - dot matches newlines (NEW in v0.2.1)
let dotall = Pattern::new(r"(?s)rule\s+.*?\}").unwrap();
let multiline = "rule test {\n  content\n}";
assert!(dotall.is_match(multiline));    // (?s) makes .* match across newlines

// Non-capturing groups with alternation (NEW in v0.2.1)
let group = Pattern::new(r#"(?:"test"|foo)"#).unwrap();
assert!(group.is_match("\"test\""));    // Matches quoted "test"
assert!(group.is_match("foo"));         // Or matches foo

// Digit matching (DigitRun fast path - 1.4-1.9x faster than regex!)
let digits = Pattern::new("\\d+").unwrap();
let matches = digits.find_all("Order #12345 costs $67.89");
// Returns: [(7, 12), (20, 22), (23, 25)]

// Identifier matching (IdentifierRun fast path)
let ident = Pattern::new("[a-zA-Z_]\\w*").unwrap();
assert!(ident.is_match("variable_name_123"));

// Quoted strings (QuotedString fast path - 1.4-1.9x faster!)
let quoted = Pattern::new("\"[^\"]+\"").unwrap();
assert!(quoted.is_match("say \"hello world\""));

// Word boundaries
let word = Pattern::new("\\btest\\b").unwrap();
assert!(word.is_match("this is a test"));
assert!(!word.is_match("testing"));

// Range quantifiers (NEW in v0.4.7)
let ip = Pattern::new(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}").unwrap();
assert!(ip.is_match("192.168.1.1"));       // Matches IP addresses

let year = Pattern::new(r"\b\d{4}\b").unwrap();
assert_eq!(year.find("Year: 2024!"), Some((6, 10))); // Matches exactly 4 digits

// Case-insensitive matching (NEW in v0.4.7)
let method = Pattern::new(r"(?i)(GET|POST)").unwrap();
assert!(method.is_match("GET /api"));      // Matches GET
assert!(method.is_match("get /api"));      // Also matches lowercase
assert!(method.is_match("Post /data"));    // Also matches Post

// Anchors
let exact = Pattern::new("^hello$").unwrap();
assert!(exact.is_match("hello"));
assert!(!exact.is_match("hello world"));
```

### Cached API (Recommended for Hot Paths)

For patterns used repeatedly in hot loops:

```rust
use rexile;

// Automatically cached - compile once, reuse forever
assert!(rexile::is_match("test", "this is a test").unwrap());
assert_eq!(rexile::find("world", "hello world").unwrap(), Some((6, 11)));

// Perfect for parsers and lexers
for line in log_lines {
    if rexile::is_match("ERROR", line).unwrap() {
        // handle error
    }
}
```

## โœจ Supported Features

### Fast Path Optimizations (10 Types)

ReXile uses **JIT-style specialized implementations** for common patterns:

| Fast Path | Pattern Example | Performance vs regex |
|-----------|----------------|---------------------|
| **Literal** | `"hello"` | Competitive (SIMD) |
| **LiteralPlusWhitespace** | `"rule "` | Competitive |
| **DigitRun** | `\d+` | **1.4-1.9x faster** โœจ |
| **IdentifierRun** | `[a-zA-Z_]\w*` | **104.7x faster compilation** |
| **QuotedString** | `"[^"]+"` | **1.4-1.9x faster** โœจ |
| **WordRun** | `\w+` | Competitive |
| **DotWildcard** | `.`, `.*`, `.+` | With backtracking |
| **Alternation** | `foo\|bar\|baz` | 2x slower (acceptable) |
| **LiteralWhitespaceQuoted** | Complex | Competitive |
| **LiteralWhitespaceDigits** | Complex | Competitive |

### Regex Features

| Feature | Example | Status |
|---------|---------|--------|
| Literal strings | `hello`, `world` | โœ… Supported |
| Alternation | `foo\|bar\|baz` | โœ… Supported (aho-corasick) |
| Start anchor | `^start` | โœ… Supported |
| End anchor | `end$` | โœ… Supported |
| Exact match | `^exact$` | โœ… Supported |
| Character classes | `[a-z]`, `[0-9]`, `[^abc]` | โœ… Supported |
| Quantifiers | `*`, `+`, `?` | โœ… Supported |
| **Non-greedy quantifiers** | `.*?`, `+?`, `??` | โœ… **Supported (v0.2.1)** |
| **Dot wildcard** | `.`, `.*`, `.+` | โœ… **Supported (v0.2.0)** |
| **DOTALL mode** | `(?s)` - dot matches newlines | โœ… **Supported (v0.2.1)** |
| Escape sequences | `\d`, `\w`, `\s`, `\.`, `\n`, `\t` | โœ… Supported |
| Sequences | `ab+c*`, `\d+\w*` | โœ… Supported |
| **Non-capturing groups** | `(?:abc\|def)` | โœ… **Supported (v0.2.1)** |
| **Capturing groups** | Extract `(group)` | โœ… **Supported (v0.2.0)** |
| Word boundaries | `\b`, `\B` | โœ… Supported |
| Bounded quantifiers | `{n}`, `{n,m}` | ๐Ÿšง Planned |
| Lookahead/lookbehind | `(?=...)`, `(?<=...)` | ๐Ÿšง Planned |
| Backreferences | `\1`, `\2` | ๐Ÿšง Planned |

## ๐Ÿ“Š Performance Benchmarks

### Compilation Speed (Primary Advantage)

**Pattern Compilation Benchmark** (vs regex crate):

| Pattern | rexile | regex | Speedup |
|---------|--------|-------|---------|
| `[a-zA-Z_]\w*` | 95.2 ns | 9.97 ยตs | **104.7x faster** ๐Ÿš€ |
| `\d+` | 86.7 ns | 4.03 ยตs | **46.5x faster** ๐Ÿš€ |
| `(\w+)\s*(>=\|<=\|==\|!=\|>\|<)\s*(.+)` | 471 ns | 19.2 ยตs | **40.7x faster** ๐Ÿš€ |
| `.*test.*` | 148 ns | 2.27 ยตs | **15.3x faster** ๐Ÿš€ |

**Average: 10-100x faster compilation** - Perfect for dynamic patterns!

### Matching Speed

**Simple Patterns** (Fast paths):
- Pattern `\d+` on "12345": **1.4-1.9x faster** โœ…
- Pattern `\w+` on "variable": **1.4-1.9x faster** โœ…
- Pattern `"[^"]+"` on quoted strings: **Competitive** โœ…

**Complex Patterns** (Backtracking):
- Pattern `a.+c` on "abc": **2-5x slower** (acceptable)
- Pattern `.*test.*` on long strings: **2-10x slower** (acceptable)
- **Trade-off**: 100x faster compilation vs slightly slower complex matching

### Use Case Performance

**Loading 1000 GRL Rules:**
- regex crate: ~2 seconds (2ms per pattern)
- rexile: ~0.02 seconds (20ยตs per pattern)
- **Result: 100x faster startup!** Perfect for parsers and rule engines.

### Memory Comparison

**Test 1: Pattern Compilation** (10 patterns):
- regex: 1920 KB in 7.89ms
- ReXile: 128 KB in 370ยตs
- **Result: 15x less memory, 21x faster** โœจ

**Test 2: Search Operations** (5 patterns ร— 139KB corpus):
- Both: 0 bytes memory delta
- **Result: Equal efficiency** โœ…

**Test 3: Stress Test** (50 patterns ร— 500KB corpus):
- regex: 0.62 MB peak in 46ms
- ReXile: 0.12 MB peak in 27ms
- **Result: 5x less peak memory, 1.7x faster** โœจ

### When ReXile Wins

โœ… **Simple patterns** (`\d+`, `\w+`) - 1.4-1.9x faster matching
โœ… **Fast compilation** - 10-100x faster pattern compilation (huge win!)
โœ… **Identifiers** (`[a-zA-Z_]\w*`) - 104.7x faster compilation
โœ… **Memory efficiency** - 15x less for compilation, 5x less peak
โœ… **Instant startup** - Load 1000 patterns in 0.02s vs 2s (100x faster)
โœ… **Dot wildcards** - Full `.`, `.*`, `.+` support with backtracking

### When regex Wins

โš ๏ธ **Complex patterns with backtracking** - ReXile 2-10x slower (acceptable trade-off)
โš ๏ธ **Alternations** (`when|then`) - ReXile 2x slower
โš ๏ธ **Hot-path matching** - For performance-critical matching, regex may be better

### Architecture

```
Pattern โ†’ Parser โ†’ AST โ†’ Fast Path Detection โ†’ Specialized Matcher
                                                        โ†“
                                     DigitRun (memchr SIMD scanning)
                                     IdentifierRun (direct byte scanning)
                                     QuotedString (memchr + validation)
                                     Alternation (aho-corasick automaton)
                                     Literal (memchr SIMD)
                                     ... 5 more fast paths
```

**Run benchmarks yourself:**
```bash
cargo run --release --example per_file_grl_benchmark
cargo run --release --example memory_comparison
```

## ๐Ÿ“ฆ Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
rexile = "0.2"
```

## ๐ŸŽ“ Examples

### Literal Search

```rust
let p = Pattern::new("needle").unwrap();
assert!(p.is_match("needle in a haystack"));
assert_eq!(p.find("where is the needle?"), Some((13, 19)));

// Find all occurrences
let matches = p.find_all("needle and needle");
assert_eq!(matches, vec![(0, 6), (11, 17)]);
```

### Multi-Pattern (Alternation)

```rust
// Fast multi-pattern search using aho-corasick
let keywords = Pattern::new("import|export|function|class").unwrap();
assert!(keywords.is_match("export default function"));
```

### Anchored Patterns

```rust
// Must start with pattern
let starts = Pattern::new("^Hello").unwrap();
assert!(starts.is_match("Hello World"));
assert!(!starts.is_match("Say Hello"));

// Must end with pattern
let ends = Pattern::new("World$").unwrap();
assert!(ends.is_match("Hello World"));
assert!(!ends.is_match("World Peace"));

// Exact match
let exact = Pattern::new("^exact$").unwrap();
assert!(exact.is_match("exact"));
assert!(!exact.is_match("not exact"));
```

### Cached API (Best for Repeated Patterns)

```rust
// First call compiles and caches
rexile::is_match("keyword", "find keyword here").unwrap();

// Subsequent calls reuse cached pattern (zero compile cost)
rexile::is_match("keyword", "another keyword").unwrap();
rexile::is_match("keyword", "more keyword text").unwrap();
```

**๐Ÿ“š More examples:** See [examples/](examples/) directory for:
- [`basic_usage.rs`]examples/basic_usage.rs - Core API walkthrough
- [`log_processing.rs`]examples/log_processing.rs - Log analysis patterns
- [`performance.rs`]examples/performance.rs - Performance comparison

Run examples with:
```bash
cargo run --example basic_usage
cargo run --example log_processing
```

## ๐Ÿ”ง Use Cases

ReXile is production-ready for:

### โœ… Ideal Use Cases
- **Parsers and lexers** - 21x faster pattern compilation, competitive matching
- **Rule engines** - Simple pattern matching in business rules (original use case!)
- **Log processing** - Fast keyword and pattern extraction
- **Dynamic patterns** - Applications that compile patterns at runtime
- **Memory-constrained environments** - 15x less compilation memory
- **Low-latency applications** - Predictable performance, no JIT warmup

### ๐ŸŽฏ Perfect Patterns for ReXile
- **Fast compilation**: All patterns compile 10-100x faster
- **Simple matching**: `\d+`, `\w+` (1.4-1.9x faster matching)
- **Identifiers**: `[a-zA-Z_]\w*` (104.7x faster compilation!)
- **Dot wildcards**: `.`, `.*`, `.+` with proper backtracking
- **Keyword search**: `rule\s+`, `function\s+`
- **Many patterns**: Load 1000 patterns instantly (100x faster startup)

### โš ๏ธ Consider regex crate for
- Complex alternations (ReXile 2x slower)
- Very sparse patterns (ReXile up to 1.44x slower)
- Unicode properties (`\p{L}` - not yet supported)
- Advanced features (lookahead, backreferences - not yet supported)

## ๐Ÿค Contributing

Contributions welcome! ReXile is actively maintained and evolving.

**Current focus:**
- โœ… Core regex features complete
- โœ… **Dot wildcard** (`.`, `.*`, `.+`) with backtracking - **v0.2.0**
- โœ… **Capturing groups** - Auto-detection and extraction - **v0.2.0**
- โœ… **Non-greedy quantifiers** (`.*?`, `+?`, `??`) - **v0.2.1**
- โœ… **DOTALL mode** (`(?s)`) for multiline matching - **v0.2.1**
- โœ… **Non-capturing groups** (`(?:...)`) with alternations - **v0.2.1**
- โœ… 10-100x faster compilation
- ๐Ÿ”„ Advanced features: bounded quantifiers `{n,m}`, lookahead, Unicode support

**How to contribute:**
1. Check [issues]https://github.com/KSD-CO/rexile/issues for open tasks
2. Run tests: `cargo test`
3. Run benchmarks: `cargo run --release --example per_file_grl_benchmark`
4. Submit PR with benchmarks showing performance impact

**Priority areas:**
- ๐Ÿ“‹ Bounded quantifiers (`{n}`, `{n,m}`)
- ๐Ÿ“‹ More fast path patterns
- ๐Ÿ“‹ Unicode support
- ๐Ÿ“‹ Documentation improvements

## ๐Ÿ“œ License

Licensed under either of:

- MIT License ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)

at your option.

## ๐Ÿ™ Credits

Built on top of:
- [`memchr`]https://docs.rs/memchr by Andrew Gallant - SIMD-accelerated substring search
- [`aho-corasick`]https://docs.rs/aho-corasick by Andrew Gallant - Multi-pattern matching automaton

Developed for the [rust-rule-engine](https://github.com/KSD-CO/rust-rule-engine) project, providing fast pattern matching for GRL (Grule Rule Language) parsing and business rule evaluation.

**Performance Philosophy:**
ReXile achieves competitive performance through **intelligent specialization** rather than complex JIT compilation:
- 10 hand-optimized fast paths for common patterns
- SIMD acceleration via memchr
- Pre-built automatons for alternations
- Zero-copy iterator design
- Minimal metadata overhead

---

**Status:** โœ… Production Ready (v0.2.1)

- โœ… **Compilation Speed:** 10-100x faster than regex crate
- โœ… **Matching Speed:** 1.4-1.9x faster on simple patterns
- โœ… **Memory:** 15x less compilation, 5x less peak
- โœ… **Features:** Core regex + dot wildcard + capturing groups + non-greedy + DOTALL + non-capturing groups
- โœ… **Testing:** 84 unit tests + 13 group integration tests passing
- โœ… **Real-world validated:** GRL parsing, rule engines, DSL compilers