ReXile ๐ฆ
A blazing-fast regex engine with JIT-style optimizations and minimal dependencies
ReXile is a zero-dependency regex alternative (no regex crate!) that achieves competitive performance through intelligent fast paths:
- โก Performance-competitive with regex crate - Within 3% on real-world workloads
- ๐ง 15x less memory for pattern compilation - Minimal metadata overhead
- ๐ 21x faster pattern compilation - Critical for dynamic patterns
- ๐ฆ Only 2 dependencies -
memchrandaho-corasickfor SIMD primitives - ๐ฏ 10 specialized fast paths - JIT-style optimizations without JIT complexity
- ๐ง Full control - Custom optimizations for parsers, lexers, and rule engines
Key Features:
- โ Literal searches with SIMD acceleration
- โ Multi-pattern matching (alternations)
- โ Character classes with negation
- โ
Quantifiers (
*,+,?) - โ
Escape sequences (
\d,\w,\s, etc.) - โ Sequences and groups
- โ
Word boundaries (
\b,\B) - โ
Anchoring (
^,$)
๐ฏ Purpose
ReXile is a production-ready regex engine built from scratch for maximum performance and minimal overhead:
- ๐ฏ Competitive performance - 1.03x aggregate ratio vs
regexcrate on real workloads - โก JIT-style optimizations - 10 specialized fast paths for common patterns
- ๐ฆ Minimal dependencies - Only
memchr+aho-corasickfor SIMD primitives - ๐ Lightning-fast compilation - 21x faster than
regexcrate - ๐พ Memory efficient - 15x less compilation memory, 5x less peak memory
- ๐ง Full control - Custom optimizations for specific use cases
Performance Highlights
Real-World GRL Benchmark (6 patterns ร 41 files):
- Pattern
\d+: 3.57x faster than regex (41/41 wins) - Pattern
"[^"]+": 2.44x faster than regex (41/41 wins) - Pattern
rule\s+: 1.05x faster than regex - Aggregate: 1.03x (within 3% of regex - competitive!)
Memory Comparison:
- Compilation: 15x less memory (128 KB vs 1920 KB)
- Compilation time: 21x faster (370ยตs vs 7.89ms)
- Peak memory: 5x less in stress tests (0.12 MB vs 0.62 MB)
- Search operations: Equal memory efficiency
When to Use ReXile:
- โ Parsers & lexers (fast token matching)
- โ Rule engines (business logic pattern matching)
- โ Log processing (keyword search)
- โ Dynamic patterns (21x faster compilation)
- โ Memory-constrained environments (15x less memory)
- โ Low-latency applications (competitive performance)
๐ Quick Start
use Pattern;
// Literal matching with SIMD acceleration
let pattern = new.unwrap;
assert!;
assert_eq!;
// Multi-pattern matching (aho-corasick fast path)
let multi = new.unwrap;
assert!;
// Digit matching (DigitRun fast path - 3.57x faster than regex!)
let digits = new.unwrap;
let matches = digits.find_all;
// Returns: [(7, 12), (20, 22), (23, 25)]
// Identifier matching (IdentifierRun fast path)
let ident = new.unwrap;
assert!;
// Quoted strings (QuotedString fast path - 2.44x faster!)
let quoted = new.unwrap;
assert!;
// Word boundaries
let word = new.unwrap;
assert!;
assert!;
// Anchors
let exact = new.unwrap;
assert!;
assert!;
Cached API (Recommended for Hot Paths)
For patterns used repeatedly in hot loops:
use rexile;
// Automatically cached - compile once, reuse forever
assert!;
assert_eq!;
// Perfect for parsers and lexers
for line in log_lines
โจ Supported Features
Fast Path Optimizations (10 Types)
ReXile uses JIT-style specialized implementations for common patterns:
| Fast Path | Pattern Example | Performance vs regex |
|---|---|---|
| Literal | "hello" |
Competitive (SIMD) |
| LiteralPlusWhitespace | "rule " |
Competitive |
| DigitRun | \d+ |
3.57x faster โจ |
| IdentifierRun | [a-zA-Z_]\w* |
2520x faster (vs general) |
| QuotedString | "[^"]+" |
2.44x faster โจ |
| WordRun | \w+ |
Competitive |
| Alternation | foo|bar|baz |
2x slower (acceptable) |
| LiteralWhitespaceQuoted | Complex | Competitive |
| LiteralWhitespaceDigits | Complex | Competitive |
Regex Features
| Feature | Example | Status |
|---|---|---|
| Literal strings | hello, world |
โ Supported |
| Alternation | foo|bar|baz |
โ Supported (aho-corasick) |
| Start anchor | ^start |
โ Supported |
| End anchor | end$ |
โ Supported |
| Exact match | ^exact$ |
โ Supported |
| Character classes | [a-z], [0-9], [^abc] |
โ Supported |
| Quantifiers | *, +, ? |
โ Supported |
| Escape sequences | \d, \w, \s, \., \n, \t |
โ Supported |
| Sequences | ab+c*, \d+\w* |
โ Supported |
| Groups | (abc), (?:...) |
โ Supported |
| Word boundaries | \b, \B |
โ Supported |
| Bounded quantifiers | {n}, {n,m} |
๐ง Planned |
| Capturing groups | Extract (group) |
๐ง Planned |
| Lookahead/lookbehind | (?=...), (?<=...) |
๐ง Planned |
| Backreferences | \1, \2 |
๐ง Planned |
๏ฟฝ Performance Benchmarks
Real-World GRL Benchmark
Testing 6 realistic patterns across 41 GRL files (total ~139KB):
| Pattern | Description | Performance | Result |
|---|---|---|---|
\d+ |
Digit sequences | 0.28x | 3.57x faster โจ |
"[^"]+" |
Quoted strings | 0.41x | 2.44x faster โจ |
rule\s+ |
Rule keyword | 0.95x | 5% faster |
salience\s+\d+ |
Salience declarations | 1.10x | Competitive |
query\s+ |
Query keyword (sparse) | 1.44x | Expected loss |
when|then |
Alternation | 1.99x | 2x slower (acceptable) |
| AGGREGATE | All patterns | 1.03x | Within 3% of regex! โ |
Perfect Performance (82/82 wins):
- Digit patterns: 41/41 wins (3.57x faster)
- Quoted strings: 41/41 wins (2.44x faster)
Memory Comparison
Test 1: Pattern Compilation (10 patterns):
- regex: 1920 KB in 7.89ms
- ReXile: 128 KB in 370ยตs
- Result: 15x less memory, 21x faster โจ
Test 2: Search Operations (5 patterns ร 139KB corpus):
- Both: 0 bytes memory delta
- Result: Equal efficiency โ
Test 3: Stress Test (50 patterns ร 500KB corpus):
- regex: 0.62 MB peak in 46ms
- ReXile: 0.12 MB peak in 27ms
- Result: 5x less peak memory, 1.7x faster โจ
When ReXile Wins
โ
Digit sequences (\d+) - 3.57x faster
โ
Quoted strings ("[^"]+") - 2.44x faster
โ
Word runs (\w+) - Competitive
โ
Identifiers ([a-zA-Z_]\w*) - 2520x faster than general matcher
โ
Pattern compilation - 21x faster
โ
Memory usage - 15x less for compilation, 5x less peak
When regex Wins
โ ๏ธ Alternations (when|then) - ReXile 2x slower (trade-off for simplicity)
โ ๏ธ Sparse matches (query\s+) - ReXile 1.44x slower (expected)
Architecture
Pattern โ Parser โ AST โ Fast Path Detection โ Specialized Matcher
โ
DigitRun (memchr SIMD scanning)
IdentifierRun (direct byte scanning)
QuotedString (memchr + validation)
Alternation (aho-corasick automaton)
Literal (memchr SIMD)
... 5 more fast paths
Run benchmarks yourself:
๐ฆ Installation
Add to your Cargo.toml:
[]
= "0.1"
๐ Examples
Literal Search
let p = new.unwrap;
assert!;
assert_eq!;
// Find all occurrences
let matches = p.find_all;
assert_eq!;
Multi-Pattern (Alternation)
// Fast multi-pattern search using aho-corasick
let keywords = new.unwrap;
assert!;
Anchored Patterns
// Must start with pattern
let starts = new.unwrap;
assert!;
assert!;
// Must end with pattern
let ends = new.unwrap;
assert!;
assert!;
// Exact match
let exact = new.unwrap;
assert!;
assert!;
Cached API (Best for Repeated Patterns)
// First call compiles and caches
is_match.unwrap;
// Subsequent calls reuse cached pattern (zero compile cost)
is_match.unwrap;
is_match.unwrap;
๐ More examples: See examples/ directory for:
basic_usage.rs- Core API walkthroughlog_processing.rs- Log analysis patternsperformance.rs- Performance comparison
Run examples with:
๐ง Use Cases
ReXile is production-ready for:
โ Ideal Use Cases
- Parsers and lexers - 21x faster pattern compilation, competitive matching
- Rule engines - Simple pattern matching in business rules (original use case!)
- Log processing - Fast keyword and pattern extraction
- Dynamic patterns - Applications that compile patterns at runtime
- Memory-constrained environments - 15x less compilation memory
- Low-latency applications - Predictable performance, no JIT warmup
๐ฏ Perfect Patterns for ReXile
- Digit extraction:
\d+(3.57x faster!) - Quoted strings:
"[^"]+"(2.44x faster!) - Identifiers:
[a-zA-Z_]\w*(2520x faster than general matcher!) - Word runs:
\w+ - Keyword search:
rule\s+,function\s+
โ ๏ธ Consider regex crate for
- Complex alternations (ReXile 2x slower)
- Very sparse patterns (ReXile up to 1.44x slower)
- Unicode properties (
\p{L}- not yet supported) - Advanced features (lookahead, backreferences - not yet supported)
๐ค Contributing
Contributions welcome! ReXile is actively maintained and evolving.
Current focus:
- โ Core regex features complete
- โ 10 fast path optimizations implemented
- โ Production-ready performance (1.03x aggregate vs regex)
- ๐ Advanced features: bounded quantifiers
{n,m}, capturing groups, lookahead
How to contribute:
- Check issues for open tasks
- Run tests:
cargo test - Run benchmarks:
cargo run --release --example per_file_grl_benchmark - Submit PR with benchmarks showing performance impact
Priority areas:
- ๏ฟฝ Bounded quantifiers (
{n},{n,m}) - ๐ Capturing group extraction
- ๐ More fast path patterns
- ๐ Unicode support
- ๐ Documentation improvements
๐ License
Licensed under either of:
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
at your option.
๐ Credits
Built on top of:
memchrby Andrew Gallant - SIMD-accelerated substring searchaho-corasickby Andrew Gallant - Multi-pattern matching automaton
Developed for the rust-rule-engine project, providing fast pattern matching for GRL (Grule Rule Language) parsing and business rule evaluation.
Performance Philosophy: ReXile achieves competitive performance through intelligent specialization rather than complex JIT compilation:
- 10 hand-optimized fast paths for common patterns
- SIMD acceleration via memchr
- Pre-built automatons for alternations
- Zero-copy iterator design
- Minimal metadata overhead
Status: โ Production Ready (v0.1.0)
- โ Performance: 1.03x aggregate vs regex (within 3%)
- โ Memory: 15x less compilation, 5x less peak
- โ Features: All core regex features working
- โ Testing: 77 unit tests passing, comprehensive benchmarks
- โ Real-world validated: GRL parsing, rule engines, log processing