ReXile ๐ฆ
A blazing-fast regex engine with 10-100x faster compilation speed
ReXile is a lightweight regex alternative that achieves exceptional compilation speed while maintaining competitive matching performance:
- โก 19x faster compilation - Load patterns instantly
- ๐ Competitive matching - 2-3x faster on simple patterns, 1.3x overall
- ๐ฏ Dot wildcard support - Full
.,.*,.+implementation with backtracking - ๐ฆ Only 2 dependencies -
memchrandaho-corasickfor SIMD primitives - ๐ง Smart backtracking - Handles complex patterns with quantifiers
- ๐ง Perfect for parsers - Ideal for GRL, DSL, and rule engines
Key Features:
- โ Literal searches with SIMD acceleration
- โ Multi-pattern matching (alternations)
- โ Character classes with negation
- โ
Quantifiers (
*,+,?,{n},{n,m}) - FIXED in v0.5.0 - โ
Range quantifiers (
{n},{n,},{n,m}) - Bug fixed in v0.5.0! - โ
Non-greedy quantifiers (
*?,+?,??) - โ
Case-insensitive flag (
(?i)) - โ
Dot wildcard (
.,.*,.+) with backtracking - โ
DOTALL mode (
(?s)) - Dot matches newlines - โ
Non-capturing groups (
(?:...)) with alternations - โ Hybrid DFA/NFA engine - Smart pattern routing
- โ
Escape sequences (
\d,\w,\s, etc.) - โ Sequences and groups
- โ
Word boundaries (
\b,\B) - โ
Anchoring (
^,$) - โ Capturing groups - Auto-detection and extraction
- โ
Lookahead/lookbehind -
(?=...),(?!...),(?<=...),(?<!...)with combined patterns - โ
Backreferences -
\1,\2, etc. - โ
Text replacement -
replace(),replace_all()with capture support - โ
Text splitting -
split()iterator - โ 50%+ faster pattern matching - Optimized in v0.5.1
- โ
Bounded quantifier fast paths -
\d{4},\w{2,}now 2x faster than regex - v0.5.4 - โ Case-insensitive optimization - Zero-alloc ASCII fast path - v0.5.4
- โ
Backreference fix -
\1,\2now working correctly - v0.5.4
๐ฏ Purpose
ReXile is a high-performance regex engine optimized for fast compilation:
- ๐ Lightning-fast compilation - 10-100x faster than
regexcrate - โก Competitive matching - Faster on simple patterns, acceptable on complex
- ๐ฏ Ideal for parsers - GRL, DSL, rule engines with dynamic patterns
- ๐ฆ Minimal dependencies - Only
memchr+aho-corasickfor SIMD primitives - Memory efficient - 15x less compilation memory
- ๐ง Full control - Custom optimizations for specific use cases
Performance Highlights
Compilation Speed (vs regex crate):
- Pattern
[a-zA-Z_]\w*: 104.7x faster ๐ - Pattern
\d+: 46.5x faster ๐ - Pattern
(\w+)\s*(>=|<=|==|!=|>|<)\s*(.+): 40.7x faster ๐ - Pattern
.*test.*: 15.3x faster - Average: 10-100x faster compilation
Matching Speed (v0.5.4):
- Simple patterns (
\d+,\w+,[a-zA-Z_]+): 2-3x faster โ - Bounded quantifiers (
\d{4},\w{2,}): 2x faster โ - Case-insensitive (
(?i)error): 1.8x slower (improved from 5x) - Complex patterns with backtracking: 2x slower (acceptable for non-hot-path)
- Overall: 1.3x slower matching, 19x faster compilation
Use Case Example (Load 1000 GRL rules):
- regex crate: ~2 seconds compilation
- rexile: ~0.02 seconds (100x faster startup!)
Memory Comparison:
- Compilation: 15x less memory (128 KB vs 1920 KB)
- Peak memory: 5x less in stress tests (0.12 MB vs 0.62 MB)
- Search operations: Equal memory efficiency
When to Use ReXile:
- โ Parsers & lexers (fast token matching + instant startup)
- โ Rule engines with dynamic patterns (100x faster rule loading)
- โ DSL compilers (GRL, business rules)
- โ Applications with many patterns (instant initialization)
- โ Memory-constrained environments (15x less memory)
- โ Non-hot-path matching (acceptable trade-off for 100x faster compilation)
๐ Quick Start
use Pattern;
// Literal matching with SIMD acceleration
let pattern = new.unwrap;
assert!;
assert_eq!;
// Multi-pattern matching (aho-corasick fast path)
let multi = new.unwrap;
assert!;
// Dot wildcard matching (with backtracking)
let dot = new.unwrap;
assert!; // . matches 'b'
assert!; // . matches '_'
// Greedy quantifiers with dot
let greedy = new.unwrap;
assert!; // .* matches 'b'
assert!; // .* matches '12345'
let plus = new.unwrap;
assert!; // .+ matches 'b' (requires at least one char)
assert!; // .+ needs at least 1 character
// Non-greedy quantifiers (NEW in v0.2.1)
let lazy = new.unwrap;
assert_eq!; // Matches "start{abc}", not greedy
// DOTALL mode - dot matches newlines (NEW in v0.2.1)
let dotall = new.unwrap;
let multiline = "rule test {\n content\n}";
assert!; // (?s) makes .* match across newlines
// Non-capturing groups with alternation (NEW in v0.2.1)
let group = new.unwrap;
assert!; // Matches quoted "test"
assert!; // Or matches foo
// Digit matching (DigitRun fast path - 1.4-1.9x faster than regex!)
let digits = new.unwrap;
let matches = digits.find_all;
// Returns: [(7, 12), (20, 22), (23, 25)]
// Identifier matching (IdentifierRun fast path)
let ident = new.unwrap;
assert!;
// Quoted strings (QuotedString fast path - 1.4-1.9x faster!)
let quoted = new.unwrap;
assert!;
// Word boundaries
let word = new.unwrap;
assert!;
assert!;
// Range quantifiers (NEW in v0.4.7)
let ip = new.unwrap;
assert!; // Matches IP addresses
let year = new.unwrap;
assert_eq!; // Matches exactly 4 digits
// Case-insensitive matching (NEW in v0.4.7)
let method = new.unwrap;
assert!; // Matches GET
assert!; // Also matches lowercase
assert!; // Also matches Post
// Lookahead - match prefix only if followed by pattern (NEW in v0.4.9)
let lookahead = new.unwrap;
assert!; // Matches 'foo' followed by 'bar'
assert!; // Doesn't match - not followed by 'bar'
// Negative lookahead (NEW in v0.4.9)
let neg_lookahead = new.unwrap;
assert!; // Matches 'foo' NOT followed by 'bar'
assert!;// Doesn't match - followed by 'bar'
// Lookbehind - match suffix only if preceded by pattern (NEW in v0.4.9)
let lookbehind = new.unwrap;
assert!; // Matches 'bar' preceded by 'foo'
assert!; // Doesn't match - not preceded by 'foo'
// Backreferences - match repeated patterns (NEW in v0.4.8)
let backref = new.unwrap;
assert!; // Matches repeated word
assert!;// Doesn't match - different words
// Text replacement (NEW in v0.5.0) ๐
let pattern = new.unwrap;
assert_eq!;
assert_eq!;
// Replacement with capture groups (NEW in v0.5.0)
let swap = new.unwrap;
assert_eq!;
let fmt = new.unwrap;
assert_eq!;
// Text splitting (NEW in v0.5.0)
let split = new.unwrap;
let parts: = split.split.collect;
assert_eq!;
// Anchors
let exact = new.unwrap;
assert!;
assert!;
Cached API (Recommended for Hot Paths)
For patterns used repeatedly in hot loops:
use rexile;
// Automatically cached - compile once, reuse forever
assert!;
assert_eq!;
// Perfect for parsers and lexers
for line in log_lines
โจ Supported Features
Fast Path Optimizations (10 Types)
ReXile uses JIT-style specialized implementations for common patterns:
| Fast Path | Pattern Example | Performance vs regex |
|---|---|---|
| Literal | "hello" |
Competitive (SIMD) |
| LiteralPlusWhitespace | "rule " |
Competitive |
| DigitRun | \d+ |
1.4-1.9x faster โจ |
| IdentifierRun | [a-zA-Z_]\w* |
104.7x faster compilation |
| QuotedString | "[^"]+" |
1.4-1.9x faster โจ |
| WordRun | \w+ |
Competitive |
| DotWildcard | ., .*, .+ |
With backtracking |
| Alternation | foo|bar|baz |
2x slower (acceptable) |
| LiteralWhitespaceQuoted | Complex | Competitive |
| LiteralWhitespaceDigits | Complex | Competitive |
Regex Features
| Feature | Example | Status |
|---|---|---|
| Literal strings | hello, world |
โ Supported |
| Alternation | foo|bar|baz |
โ Supported (aho-corasick) |
| Start anchor | ^start |
โ Supported |
| End anchor | end$ |
โ Supported |
| Exact match | ^exact$ |
โ Supported |
| Character classes | [a-z], [0-9], [^abc] |
โ Supported |
| Quantifiers | *, +, ? |
โ Supported |
| Non-greedy quantifiers | .*?, +?, ?? |
โ Supported (v0.2.1) |
| Dot wildcard | ., .*, .+ |
โ Supported (v0.2.0) |
| DOTALL mode | (?s) - dot matches newlines |
โ Supported (v0.2.1) |
| Escape sequences | \d, \w, \s, \., \n, \t |
โ Supported |
| Sequences | ab+c*, \d+\w* |
โ Supported |
| Non-capturing groups | (?:abc|def) |
โ Supported (v0.2.1) |
| Capturing groups | Extract (group) |
โ Supported (v0.2.0) |
| Word boundaries | \b, \B |
โ Supported |
| Range quantifiers | {n}, {n,}, {n,m} |
โ Supported (v0.4.7) - FIXED in v0.5.0 |
| Lookahead/lookbehind | (?=...), (?!...), (?<=...), (?<!...) |
โ Supported (v0.4.9) |
| Backreferences | \1, \2, etc. |
โ Supported (v0.4.8) |
| Text replacement | replace(), replace_all() |
โ NEW in v0.5.0 ๐ |
| Text splitting | split() |
โ NEW in v0.5.0 ๐ |
๐ Performance Benchmarks
Compilation Speed (Primary Advantage)
Pattern Compilation Benchmark (vs regex crate):
| Pattern | rexile | regex | Speedup |
|---|---|---|---|
[a-zA-Z_]\w* |
95.2 ns | 9.97 ยตs | 104.7x faster ๐ |
\d+ |
86.7 ns | 4.03 ยตs | 46.5x faster ๐ |
(\w+)\s*(>=|<=|==|!=|>|<)\s*(.+) |
471 ns | 19.2 ยตs | 40.7x faster ๐ |
.*test.* |
148 ns | 2.27 ยตs | 15.3x faster ๐ |
Average: 10-100x faster compilation - Perfect for dynamic patterns!
Matching Speed
Simple Patterns (Fast paths):
- Pattern
\d+on "12345": 1.4-1.9x faster โ - Pattern
\w+on "variable": 1.4-1.9x faster โ - Pattern
"[^"]+"on quoted strings: Competitive โ
Complex Patterns (Backtracking):
- Pattern
a.+con "abc": 2-5x slower (acceptable) - Pattern
.*test.*on long strings: 2-10x slower (acceptable) - Trade-off: 100x faster compilation vs slightly slower complex matching
Use Case Performance
Loading 1000 GRL Rules:
- regex crate: ~2 seconds (2ms per pattern)
- rexile: ~0.02 seconds (20ยตs per pattern)
- Result: 100x faster startup! Perfect for parsers and rule engines.
Memory Comparison
Test 1: Pattern Compilation (10 patterns):
- regex: 1920 KB in 7.89ms
- ReXile: 128 KB in 370ยตs
- Result: 15x less memory, 21x faster โจ
Test 2: Search Operations (5 patterns ร 139KB corpus):
- Both: 0 bytes memory delta
- Result: Equal efficiency โ
Test 3: Stress Test (50 patterns ร 500KB corpus):
- regex: 0.62 MB peak in 46ms
- ReXile: 0.12 MB peak in 27ms
- Result: 5x less peak memory, 1.7x faster โจ
Detailed Matching Benchmark (v0.5.4)
| Pattern | rexile | regex | Ratio | Winner |
|---|---|---|---|---|
\d+ |
6ns | 11ns | 0.56x | rexile |
\w+@\w+ |
10ns | 40ns | 0.25x | rexile |
[a-zA-Z_]+ |
4ns | 11ns | 0.36x | rexile |
\d{4} |
6ns | 12ns | 0.53x | rexile |
\w{2,} |
5ns | 10ns | 0.44x | rexile |
\d+\.\d+ |
21ns | 32ns | 0.67x | rexile |
ERROR (literal) |
7ns | 9ns | 0.82x | rexile |
(?i)error |
63ns | 34ns | 1.84x | regex |
(\w+)@(\w+) |
86ns | 41ns | 2.11x | regex |
\w+\s+\d+ |
135ns | 63ns | 2.10x | regex |
Wins: 11/22 test cases | Overall: 1.30x | Compilation: 19x faster
When ReXile Wins
โ
Simple patterns (\d+, \w+, [a-zA-Z_]+) - 2-3x faster matching
โ
Bounded quantifiers (\d{4}, \w{2,}) - 2x faster matching
โ
DFA patterns (\d+\.\d+) - 1.5x faster matching
โ
Fast compilation - 19x faster pattern compilation
โ
Memory efficiency - 15x less for compilation, 5x less peak
โ
Instant startup - Load 1000 patterns in 0.02s vs 2s
โ
Lookaround & backreferences - Not supported by regex crate
When regex Wins
โ ๏ธ Case-insensitive ((?i)) - ReXile ~2x slower
โ ๏ธ Complex sequences (\w+\s+\d+) - ReXile ~2x slower
โ ๏ธ Capture groups - ReXile ~2x slower
โ ๏ธ Overlap patterns ([a-z]+.+[0-9]+) - ReXile ~2x slower
Architecture
Pattern โ Parser โ AST โ Fast Path Detection โ Specialized Matcher
โ
DigitRun (memchr SIMD scanning)
IdentifierRun (direct byte scanning)
QuotedString (memchr + validation)
Alternation (aho-corasick automaton)
Literal (memchr SIMD)
... 5 more fast paths
Run benchmarks yourself:
๐ฆ Installation
Add to your Cargo.toml:
[]
= "0.5"
๐ Examples
Literal Search
let p = new.unwrap;
assert!;
assert_eq!;
// Find all occurrences
let matches = p.find_all;
assert_eq!;
Multi-Pattern (Alternation)
// Fast multi-pattern search using aho-corasick
let keywords = new.unwrap;
assert!;
Anchored Patterns
// Must start with pattern
let starts = new.unwrap;
assert!;
assert!;
// Must end with pattern
let ends = new.unwrap;
assert!;
assert!;
// Exact match
let exact = new.unwrap;
assert!;
assert!;
Cached API (Best for Repeated Patterns)
// First call compiles and caches
is_match.unwrap;
// Subsequent calls reuse cached pattern (zero compile cost)
is_match.unwrap;
is_match.unwrap;
๐ More examples: See examples/ directory for:
basic_usage.rs- Core API walkthroughlog_processing.rs- Log analysis patternsperformance.rs- Performance comparison
Run examples with:
๐ง Use Cases
ReXile is production-ready for:
โ Ideal Use Cases
- Parsers and lexers - 21x faster pattern compilation, competitive matching
- Rule engines - Simple pattern matching in business rules (original use case!)
- Log processing - Fast keyword and pattern extraction
- Dynamic patterns - Applications that compile patterns at runtime
- Memory-constrained environments - 15x less compilation memory
- Low-latency applications - Predictable performance, no JIT warmup
๐ฏ Perfect Patterns for ReXile
- Fast compilation: All patterns compile 10-100x faster
- Simple matching:
\d+,\w+(1.4-1.9x faster matching) - Identifiers:
[a-zA-Z_]\w*(104.7x faster compilation!) - Dot wildcards:
.,.*,.+with proper backtracking - Keyword search:
rule\s+,function\s+ - Many patterns: Load 1000 patterns instantly (100x faster startup)
โ ๏ธ Consider regex crate for
- Case-insensitive matching (ReXile ~2x slower)
- Complex sequence patterns (ReXile ~2x slower)
- Unicode properties (
\p{L}- not yet supported)
๐ค Contributing
Contributions welcome! ReXile is actively maintained and evolving.
Current focus:
- โ Core regex features complete
- โ
Dot wildcard (
.,.*,.+) with backtracking - v0.2.0 - โ Capturing groups - Auto-detection and extraction - v0.2.0
- โ
Non-greedy quantifiers (
.*?,+?,??) - v0.2.1 - โ
DOTALL mode (
(?s)) for multiline matching - v0.2.1 - โ
Non-capturing groups (
(?:...)) with alternations - v0.2.1 - โ
Bounded quantifiers (
{n},{n,},{n,m}) - v0.4.7 - โ
Full lookaround support (
(?=...),(?!...),(?<=...),(?<!...)) with combined patterns - v0.4.10 - โ
Backreferences (
\1,\2, etc.) - v0.4.8 (fixed in v0.5.4) - โ Bounded quantifier fast paths - v0.5.4
- โ Case-insensitive zero-alloc fast path - v0.5.4
- โ 19x faster compilation
- ๐ Advanced features: Unicode support, more optimizations
How to contribute:
- Check issues for open tasks
- Run tests:
cargo test - Run benchmarks:
cargo run --release --example per_file_grl_benchmark - Submit PR with benchmarks showing performance impact
Priority areas:
- ๐ Unicode support (
\p{L},\p{N}, etc.) - ๐ More fast path patterns
- ๐ Named capture groups (
(?P<name>...)) - ๐ Documentation improvements
๐ License
Licensed under either of:
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
at your option.
๐ Credits
Built on top of:
memchrby Andrew Gallant - SIMD-accelerated substring searchaho-corasickby Andrew Gallant - Multi-pattern matching automaton
Developed for the rust-rule-engine project, providing fast pattern matching for GRL (Grule Rule Language) parsing and business rule evaluation.
Performance Philosophy: ReXile achieves competitive performance through intelligent specialization rather than complex JIT compilation:
- 10 hand-optimized fast paths for common patterns
- SIMD acceleration via memchr
- Pre-built automatons for alternations
- Zero-copy iterator design
- Minimal metadata overhead
Status: โ Production Ready (v0.5.4)
- โ Compilation Speed: 19x faster than regex crate
- โ Matching Speed: 2-3x faster on simple patterns, 1.3x overall
- โ Memory: 15x less compilation, 5x less peak
- โ Features: Core regex + dot wildcard + capturing groups + non-greedy + DOTALL + non-capturing groups + bounded quantifiers + full lookaround support + backreferences + replace + split
- โ Testing: 168 tests passing
- โ Real-world validated: GRL parsing, rule engines, DSL compilers