# ReXile ๐ฆ
[](https://crates.io/crates/rexile)
[](https://docs.rs/rexile)
[](LICENSE)
**A blazing-fast regex engine with 10-100x faster compilation speed**
ReXile is a **lightweight regex alternative** that achieves **exceptional compilation speed** while maintaining competitive matching performance:
- โก **10-100x faster compilation** - Load patterns instantly
- ๐ **Competitive matching** - 1.4-1.9x faster on simple patterns
- ๐ฏ **Dot wildcard support** - Full `.`, `.*`, `.+` implementation with backtracking
- ๐ฆ **Only 2 dependencies** - `memchr` and `aho-corasick` for SIMD primitives
- ๐ง **Smart backtracking** - Handles complex patterns with quantifiers
- ๐ง **Perfect for parsers** - Ideal for GRL, DSL, and rule engines
**Key Features:**
- โ
Literal searches with SIMD acceleration
- โ
Multi-pattern matching (alternations)
- โ
Character classes with negation
- โ
Quantifiers (`*`, `+`, `?`, `{n}`, `{n,m}`)
- โ
**Range quantifiers** (`{n}`, `{n,}`, `{n,m}`)
- โ
**Non-greedy quantifiers** (`*?`, `+?`, `??`)
- โ
**Case-insensitive flag** (`(?i)`)
- โ
**Dot wildcard** (`.`, `.*`, `.+`) with backtracking
- โ
**DOTALL mode** (`(?s)`) - Dot matches newlines
- โ
**Non-capturing groups** (`(?:...)`) with alternations
- โ
**Hybrid DFA/NFA engine** - Smart pattern routing - NEW in v0.4.9
- โ
Escape sequences (`\d`, `\w`, `\s`, etc.)
- โ
Sequences and groups
- โ
Word boundaries (`\b`, `\B`)
- โ
Anchoring (`^`, `$`)
- โ
**Capturing groups** - Auto-detection and extraction
## ๐ฏ Purpose
ReXile is a **high-performance regex engine** optimized for **fast compilation**:
- ๐ **Lightning-fast compilation** - 10-100x faster than `regex` crate
- โก **Competitive matching** - Faster on simple patterns, acceptable on complex
- ๐ฏ **Ideal for parsers** - GRL, DSL, rule engines with dynamic patterns
- ๐ฆ **Minimal dependencies** - Only `memchr` + `aho-corasick` for SIMD primitives
- **Memory efficient** - 15x less compilation memory
- ๐ง **Full control** - Custom optimizations for specific use cases
### Performance Highlights
**Compilation Speed** (vs regex crate):
- Pattern `[a-zA-Z_]\w*`: **104.7x faster** ๐
- Pattern `\d+`: **46.5x faster** ๐
- Pattern `(\w+)\s*(>=|<=|==|!=|>|<)\s*(.+)`: **40.7x faster** ๐
- Pattern `.*test.*`: **15.3x faster**
- **Average: 10-100x faster compilation**
**Matching Speed**:
- Simple patterns (`\d+`, `\w+`): **1.4-1.9x faster** โ
- Complex patterns with backtracking: 2-10x slower (acceptable for non-hot-path)
- **Perfect trade-off for parsers and rule engines**
**Use Case Example** (Load 1000 GRL rules):
- regex crate: ~2 seconds compilation
- rexile: ~0.02 seconds (**100x faster startup!**)
**Memory Comparison**:
- Compilation: **15x less memory** (128 KB vs 1920 KB)
- Peak memory: **5x less** in stress tests (0.12 MB vs 0.62 MB)
- Search operations: **Equal memory efficiency**
**When to Use ReXile:**
- โ
Parsers & lexers (fast token matching + instant startup)
- โ
Rule engines with dynamic patterns (100x faster rule loading)
- โ
DSL compilers (GRL, business rules)
- โ
Applications with many patterns (instant initialization)
- โ
Memory-constrained environments (15x less memory)
- โ
Non-hot-path matching (acceptable trade-off for 100x faster compilation)
## ๐ Quick Start
```rust
use rexile::Pattern;
// Literal matching with SIMD acceleration
let pattern = Pattern::new("hello").unwrap();
assert!(pattern.is_match("hello world"));
assert_eq!(pattern.find("say hello"), Some((4, 9)));
// Multi-pattern matching (aho-corasick fast path)
// Dot wildcard matching (with backtracking)
let dot = Pattern::new("a.c").unwrap();
assert!(dot.is_match("abc")); // . matches 'b'
assert!(dot.is_match("a_c")); // . matches '_'
// Greedy quantifiers with dot
let greedy = Pattern::new("a.*c").unwrap();
assert!(greedy.is_match("abc")); // .* matches 'b'
assert!(greedy.is_match("a12345c")); // .* matches '12345'
let plus = Pattern::new("a.+c").unwrap();
assert!(plus.is_match("abc")); // .+ matches 'b' (requires at least one char)
assert!(!plus.is_match("ac")); // .+ needs at least 1 character
// Non-greedy quantifiers (NEW in v0.2.1)
let lazy = Pattern::new(r"start\{.*?\}").unwrap();
assert_eq!(lazy.find("start{abc}end{xyz}"), Some((0, 10))); // Matches "start{abc}", not greedy
// DOTALL mode - dot matches newlines (NEW in v0.2.1)
let dotall = Pattern::new(r"(?s)rule\s+.*?\}").unwrap();
let multiline = "rule test {\n content\n}";
assert!(dotall.is_match(multiline)); // (?s) makes .* match across newlines
// Non-capturing groups with alternation (NEW in v0.2.1)
assert!(group.is_match("foo")); // Or matches foo
// Digit matching (DigitRun fast path - 1.4-1.9x faster than regex!)
let digits = Pattern::new("\\d+").unwrap();
let matches = digits.find_all("Order #12345 costs $67.89");
// Returns: [(7, 12), (20, 22), (23, 25)]
// Identifier matching (IdentifierRun fast path)
let ident = Pattern::new("[a-zA-Z_]\\w*").unwrap();
assert!(ident.is_match("variable_name_123"));
// Quoted strings (QuotedString fast path - 1.4-1.9x faster!)
let quoted = Pattern::new("\"[^\"]+\"").unwrap();
assert!(quoted.is_match("say \"hello world\""));
// Word boundaries
let word = Pattern::new("\\btest\\b").unwrap();
assert!(word.is_match("this is a test"));
assert!(!word.is_match("testing"));
// Range quantifiers (NEW in v0.4.7)
let ip = Pattern::new(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}").unwrap();
assert!(ip.is_match("192.168.1.1")); // Matches IP addresses
let year = Pattern::new(r"\b\d{4}\b").unwrap();
assert_eq!(year.find("Year: 2024!"), Some((6, 10))); // Matches exactly 4 digits
// Case-insensitive matching (NEW in v0.4.7)
assert!(method.is_match("get /api")); // Also matches lowercase
assert!(method.is_match("Post /data")); // Also matches Post
// Anchors
let exact = Pattern::new("^hello$").unwrap();
assert!(exact.is_match("hello"));
assert!(!exact.is_match("hello world"));
```
### Cached API (Recommended for Hot Paths)
For patterns used repeatedly in hot loops:
```rust
use rexile;
// Automatically cached - compile once, reuse forever
assert!(rexile::is_match("test", "this is a test").unwrap());
assert_eq!(rexile::find("world", "hello world").unwrap(), Some((6, 11)));
// Perfect for parsers and lexers
for line in log_lines {
if rexile::is_match("ERROR", line).unwrap() {
// handle error
}
}
```
## โจ Supported Features
### Fast Path Optimizations (10 Types)
ReXile uses **JIT-style specialized implementations** for common patterns:
| **Literal** | `"hello"` | Competitive (SIMD) |
| **LiteralPlusWhitespace** | `"rule "` | Competitive |
| **DigitRun** | `\d+` | **1.4-1.9x faster** โจ |
| **IdentifierRun** | `[a-zA-Z_]\w*` | **104.7x faster compilation** |
| **QuotedString** | `"[^"]+"` | **1.4-1.9x faster** โจ |
| **WordRun** | `\w+` | Competitive |
| **DotWildcard** | `.`, `.*`, `.+` | With backtracking |
| **Alternation** | `foo\|bar\|baz` | 2x slower (acceptable) |
| **LiteralWhitespaceQuoted** | Complex | Competitive |
| **LiteralWhitespaceDigits** | Complex | Competitive |
### Regex Features
| Literal strings | `hello`, `world` | โ
Supported |
| Alternation | `foo\|bar\|baz` | โ
Supported (aho-corasick) |
| Start anchor | `^start` | โ
Supported |
| End anchor | `end$` | โ
Supported |
| Exact match | `^exact$` | โ
Supported |
| Character classes | `[a-z]`, `[0-9]`, `[^abc]` | โ
Supported |
| Quantifiers | `*`, `+`, `?` | โ
Supported |
| **Non-greedy quantifiers** | `.*?`, `+?`, `??` | โ
**Supported (v0.2.1)** |
| **Dot wildcard** | `.`, `.*`, `.+` | โ
**Supported (v0.2.0)** |
| **DOTALL mode** | `(?s)` - dot matches newlines | โ
**Supported (v0.2.1)** |
| Escape sequences | `\d`, `\w`, `\s`, `\.`, `\n`, `\t` | โ
Supported |
| Sequences | `ab+c*`, `\d+\w*` | โ
Supported |
| **Non-capturing groups** | `(?:abc\|def)` | โ
**Supported (v0.2.1)** |
| **Capturing groups** | Extract `(group)` | โ
**Supported (v0.2.0)** |
| Word boundaries | `\b`, `\B` | โ
Supported |
| Bounded quantifiers | `{n}`, `{n,m}` | ๐ง Planned |
| Lookahead/lookbehind | `(?=...)`, `(?<=...)` | ๐ง Planned |
| Backreferences | `\1`, `\2` | ๐ง Planned |
## ๐ Performance Benchmarks
### Compilation Speed (Primary Advantage)
**Pattern Compilation Benchmark** (vs regex crate):
| `[a-zA-Z_]\w*` | 95.2 ns | 9.97 ยตs | **104.7x faster** ๐ |
| `\d+` | 86.7 ns | 4.03 ยตs | **46.5x faster** ๐ |
| `(\w+)\s*(>=\|<=\|==\|!=\|>\|<)\s*(.+)` | 471 ns | 19.2 ยตs | **40.7x faster** ๐ |
| `.*test.*` | 148 ns | 2.27 ยตs | **15.3x faster** ๐ |
**Average: 10-100x faster compilation** - Perfect for dynamic patterns!
### Matching Speed
**Simple Patterns** (Fast paths):
- Pattern `\d+` on "12345": **1.4-1.9x faster** โ
- Pattern `\w+` on "variable": **1.4-1.9x faster** โ
- Pattern `"[^"]+"` on quoted strings: **Competitive** โ
**Complex Patterns** (Backtracking):
- Pattern `a.+c` on "abc": **2-5x slower** (acceptable)
- Pattern `.*test.*` on long strings: **2-10x slower** (acceptable)
- **Trade-off**: 100x faster compilation vs slightly slower complex matching
### Use Case Performance
**Loading 1000 GRL Rules:**
- regex crate: ~2 seconds (2ms per pattern)
- rexile: ~0.02 seconds (20ยตs per pattern)
- **Result: 100x faster startup!** Perfect for parsers and rule engines.
### Memory Comparison
**Test 1: Pattern Compilation** (10 patterns):
- regex: 1920 KB in 7.89ms
- ReXile: 128 KB in 370ยตs
- **Result: 15x less memory, 21x faster** โจ
**Test 2: Search Operations** (5 patterns ร 139KB corpus):
- Both: 0 bytes memory delta
- **Result: Equal efficiency** โ
**Test 3: Stress Test** (50 patterns ร 500KB corpus):
- regex: 0.62 MB peak in 46ms
- ReXile: 0.12 MB peak in 27ms
- **Result: 5x less peak memory, 1.7x faster** โจ
### When ReXile Wins
โ
**Simple patterns** (`\d+`, `\w+`) - 1.4-1.9x faster matching
โ
**Fast compilation** - 10-100x faster pattern compilation (huge win!)
โ
**Identifiers** (`[a-zA-Z_]\w*`) - 104.7x faster compilation
โ
**Memory efficiency** - 15x less for compilation, 5x less peak
โ
**Instant startup** - Load 1000 patterns in 0.02s vs 2s (100x faster)
โ
**Dot wildcards** - Full `.`, `.*`, `.+` support with backtracking
### When regex Wins
โ ๏ธ **Complex patterns with backtracking** - ReXile 2-10x slower (acceptable trade-off)
โ ๏ธ **Alternations** (`when|then`) - ReXile 2x slower
โ ๏ธ **Hot-path matching** - For performance-critical matching, regex may be better
### Architecture
```
Pattern โ Parser โ AST โ Fast Path Detection โ Specialized Matcher
โ
DigitRun (memchr SIMD scanning)
IdentifierRun (direct byte scanning)
QuotedString (memchr + validation)
Alternation (aho-corasick automaton)
Literal (memchr SIMD)
... 5 more fast paths
```
**Run benchmarks yourself:**
```bash
cargo run --release --example per_file_grl_benchmark
cargo run --release --example memory_comparison
```
## ๐ฆ Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
rexile = "0.2"
```
## ๐ Examples
### Literal Search
```rust
let p = Pattern::new("needle").unwrap();
assert!(p.is_match("needle in a haystack"));
assert_eq!(p.find("where is the needle?"), Some((13, 19)));
// Find all occurrences
let matches = p.find_all("needle and needle");
assert_eq!(matches, vec![(0, 6), (11, 17)]);
```
### Multi-Pattern (Alternation)
```rust
// Fast multi-pattern search using aho-corasick
```
### Anchored Patterns
```rust
// Must start with pattern
let starts = Pattern::new("^Hello").unwrap();
assert!(starts.is_match("Hello World"));
assert!(!starts.is_match("Say Hello"));
// Must end with pattern
let ends = Pattern::new("World$").unwrap();
assert!(ends.is_match("Hello World"));
assert!(!ends.is_match("World Peace"));
// Exact match
let exact = Pattern::new("^exact$").unwrap();
assert!(exact.is_match("exact"));
assert!(!exact.is_match("not exact"));
```
### Cached API (Best for Repeated Patterns)
```rust
// First call compiles and caches
rexile::is_match("keyword", "find keyword here").unwrap();
// Subsequent calls reuse cached pattern (zero compile cost)
rexile::is_match("keyword", "another keyword").unwrap();
rexile::is_match("keyword", "more keyword text").unwrap();
```
**๐ More examples:** See [examples/](examples/) directory for:
- [`basic_usage.rs`](examples/basic_usage.rs) - Core API walkthrough
- [`log_processing.rs`](examples/log_processing.rs) - Log analysis patterns
- [`performance.rs`](examples/performance.rs) - Performance comparison
Run examples with:
```bash
cargo run --example basic_usage
cargo run --example log_processing
```
## ๐ง Use Cases
ReXile is production-ready for:
### โ
Ideal Use Cases
- **Parsers and lexers** - 21x faster pattern compilation, competitive matching
- **Rule engines** - Simple pattern matching in business rules (original use case!)
- **Log processing** - Fast keyword and pattern extraction
- **Dynamic patterns** - Applications that compile patterns at runtime
- **Memory-constrained environments** - 15x less compilation memory
- **Low-latency applications** - Predictable performance, no JIT warmup
### ๐ฏ Perfect Patterns for ReXile
- **Fast compilation**: All patterns compile 10-100x faster
- **Simple matching**: `\d+`, `\w+` (1.4-1.9x faster matching)
- **Identifiers**: `[a-zA-Z_]\w*` (104.7x faster compilation!)
- **Dot wildcards**: `.`, `.*`, `.+` with proper backtracking
- **Keyword search**: `rule\s+`, `function\s+`
- **Many patterns**: Load 1000 patterns instantly (100x faster startup)
### โ ๏ธ Consider regex crate for
- Complex alternations (ReXile 2x slower)
- Very sparse patterns (ReXile up to 1.44x slower)
- Unicode properties (`\p{L}` - not yet supported)
- Advanced features (lookahead, backreferences - not yet supported)
## ๐ค Contributing
Contributions welcome! ReXile is actively maintained and evolving.
**Current focus:**
- โ
Core regex features complete
- โ
**Dot wildcard** (`.`, `.*`, `.+`) with backtracking - **v0.2.0**
- โ
**Capturing groups** - Auto-detection and extraction - **v0.2.0**
- โ
**Non-greedy quantifiers** (`.*?`, `+?`, `??`) - **v0.2.1**
- โ
**DOTALL mode** (`(?s)`) for multiline matching - **v0.2.1**
- โ
**Non-capturing groups** (`(?:...)`) with alternations - **v0.2.1**
- โ
10-100x faster compilation
- ๐ Advanced features: bounded quantifiers `{n,m}`, lookahead, Unicode support
**How to contribute:**
1. Check [issues](https://github.com/KSD-CO/rexile/issues) for open tasks
2. Run tests: `cargo test`
3. Run benchmarks: `cargo run --release --example per_file_grl_benchmark`
4. Submit PR with benchmarks showing performance impact
**Priority areas:**
- ๐ Bounded quantifiers (`{n}`, `{n,m}`)
- ๐ More fast path patterns
- ๐ Unicode support
- ๐ Documentation improvements
## ๐ License
Licensed under either of:
- MIT License ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
at your option.
## ๐ Credits
Built on top of:
- [`memchr`](https://docs.rs/memchr) by Andrew Gallant - SIMD-accelerated substring search
- [`aho-corasick`](https://docs.rs/aho-corasick) by Andrew Gallant - Multi-pattern matching automaton
Developed for the [rust-rule-engine](https://github.com/KSD-CO/rust-rule-engine) project, providing fast pattern matching for GRL (Grule Rule Language) parsing and business rule evaluation.
**Performance Philosophy:**
ReXile achieves competitive performance through **intelligent specialization** rather than complex JIT compilation:
- 10 hand-optimized fast paths for common patterns
- SIMD acceleration via memchr
- Pre-built automatons for alternations
- Zero-copy iterator design
- Minimal metadata overhead
---
**Status:** โ
Production Ready (v0.2.1)
- โ
**Compilation Speed:** 10-100x faster than regex crate
- โ
**Matching Speed:** 1.4-1.9x faster on simple patterns
- โ
**Memory:** 15x less compilation, 5x less peak
- โ
**Features:** Core regex + dot wildcard + capturing groups + non-greedy + DOTALL + non-capturing groups
- โ
**Testing:** 84 unit tests + 13 group integration tests passing
- โ
**Real-world validated:** GRL parsing, rule engines, DSL compilers