> **Note for v0.2.0:** This file contains old benchmark results. See README.md for current performance metrics.
> - Compilation: 10-100x faster than regex
> - Matching: 1.4-1.9x faster on simple patterns, 2-10x slower on complex patterns
# ReXile Performance Results
## Benchmark Comparison: ReXile vs regex crate
Tested on: January 22, 2026
Platform: Linux
Rust Version: 1.84+
Build: `--release` (optimized)
---
## ๐ OVERALL RESULTS
### โ
VICTORIES (ReXile FASTER than regex)
| **Version Pattern (short)** | **15.72ns** | 22.10ns | **0.71x** | **+28.9%** |
| **Version Pattern (with text)** | **97.15ns** | 114.39ns | **0.85x** | **+15.1%** |
| **Email (long text, 1k chars)** | **1796.92ns** | 3136.73ns | **0.57x** | **+42.7%** |
| **Word Pattern (\w+)** | **3.03ns** | 8.63ns | **0.35x** | **+64.8%** |
| **Digit Pattern (\d+)** | **2.76ns** | 10.21ns | **0.27x** | **+73.0%** |
### โ ๏ธ SLOWER CASES (areas for future optimization)
| Simple Literal | 20.23ns | 8.80ns | 2.30x | Acceptable |
| URL Pattern (http) | 155.13ns | 25.50ns | 6.08x | Needs work |
| URL Pattern (with path) | 174.76ns | 27.56ns | 6.34x | Needs work |
| Email Pattern | 348.14ns | 35.03ns | 9.94x | Acceptable* |
| Version (long text, 1k) | 1691.34ns | 98.19ns | 17.23x | Needs optimization |
| Version (no match, 10k) | 4453.56ns | 93.78ns | 47.49x | Needs optimization |
\* Email pattern uses memchr anchor optimization which is appropriate for '@' character
---
## ๐ DETAILED ANALYSIS
### 1. Version Patterns (Our Strongest Case) ๐ฏ
**Pattern:** `\d+\.\d+\.\d+`
#### Short Text ("1.2.3")
- โ
**ReXile: 15.72ns** vs regex: 22.10ns
- **Winner: ReXile by 28.9%**
- Uses DFA with optimized state machine
#### Medium Text ("Version: 1.2.3 released")
- โ
**ReXile: 97.15ns** vs regex: 114.39ns
- **Winner: ReXile by 15.1%**
- DFA efficiently scans through text
#### Long Text (1000 'x' + "1.2.3")
- โ ReXile: 1691.34ns vs regex: 98.19ns (17.23x slower)
- **Issue:** DFA scans position-by-position without good skip strategy
- **TODO:** Implement better skip logic for long texts
---
### 2. Email Patterns
**Pattern:** `\w+@\w+\.\w+`
#### Short Text ("user@example.com")
- โ ReXile: 348.14ns vs regex: 35.03ns (9.94x slower)
- Uses sequence matcher with memchr for '@' anchor
- Appropriate strategy but needs refinement
#### Long Text (1000 'x' + "user@example.com")
- โ
**ReXile: 1796.92ns** vs regex: 3136.73ns
- **Winner: ReXile by 42.7%!**
- memchr anchor optimization shines on long texts
---
### 3. URL Patterns
**Pattern:** `https?://\w+\.\w+`
#### URLs are challenging due to:
- Multiple special chars (':', '/')
- Optional quantifier ('?')
- Currently 6-7x slower than regex
- **Future work:** Better handling of optional quantifiers
---
### 4. Simple Patterns (Exceptional Performance) โก
#### Word Pattern (`\w+` matching "hello")
- โ
**ReXile: 3.03ns** vs regex: 8.63ns
- **Winner: ReXile by 64.8%!**
- Extremely fast, almost zero-cost abstraction
#### Digit Pattern (`\d+` matching "12345")
- โ
**ReXile: 2.76ns** vs regex: 10.21ns
- **Winner: ReXile by 73.0%!**
- Fastest pattern type, sub-3ns matching
---
## ๐ฏ KEY ACHIEVEMENTS
1. **โ
Beat regex crate on version patterns** - Primary goal achieved!
2. **โ
Exceptional performance on simple char class patterns** (2-3ns)
3. **โ
Strong long-text performance with memchr** (Email: +42.7%)
4. **โ
DFA optimization working correctly**
5. **โ
Smart compilation strategy** (skip patterns with memchr anchors)
---
## ๐ง OPTIMIZATION TECHNIQUES USED
1. **Literal Extraction**
- Extracts prefix, alternation branches, inner anchors
- Enables fast candidate finding
2. **Prefilter System**
- memchr: Single byte searching (SIMD-accelerated)
- memmem: Single string searching
- aho-corasick: Multi-pattern matching
3. **DFA Compilation**
- Selective compilation for appropriate patterns
- Avoids patterns better handled by sequence matcher
- State machine with efficient transitions
4. **Skip Strategy**
- Detects patterns like `\d+` and `\w+`
- Skips non-matching character classes
- Significant speedup on long texts
5. **memchr Anchor Optimization**
- Uses memchr for distinctive chars like '@', ':', '/'
- Extremely fast on long texts (42.7% faster than regex!)
---
## ๐ PERFORMANCE SUMMARY BY CATEGORY
| Version Patterns (short) | โ
**WINNER** | 0.71x (28.9% faster) | DFA optimized |
| Digit/Word Classes | โ
**WINNER** | 0.27x (73% faster) | Exceptional |
| Email (long text) | โ
**WINNER** | 0.57x (42.7% faster) | memchr shines |
| Simple Literals | โก Competitive | 2.30x | Acceptable |
| URLs | โ ๏ธ Slower | 6-7x | Needs work |
| Long text (no match) | โ ๏ธ Slower | 47x | Needs optimization |
---
## ๐ LESSONS LEARNED
1. **DFA is excellent for patterns like `\d+\.\d+\.\d+`**
- Beats regex on short/medium texts
- Needs better skip strategy for long texts
2. **memchr anchor optimization is crucial**
- Distinctive chars like '@' are perfect anchors
- Enables massive speedups on long texts
3. **Smart compilation strategy matters**
- Don't compile DFA for everything
- Use sequence matcher for patterns with good memchr anchors
4. **Simple patterns can be blazingly fast**
- `\d+` in 2.76ns (73% faster than regex!)
- `\w+` in 3.03ns (64.8% faster than regex!)
---
## ๐ FUTURE OPTIMIZATIONS
### High Priority
1. **Improve skip strategy for long texts**
- Version pattern: 1.6ยตs โ target <100ns
- No-match case: 4.4ยตs โ target <100ns
2. **Better optional quantifier handling**
- URL patterns currently 6x slower
- Need optimized `?` quantifier path
### Medium Priority
3. **Lazy DFA compilation**
- Build states on-demand during matching
- Avoid upfront compilation cost
4. **SIMD optimizations**
- Use explicit SIMD for char class checking
- Parallel position evaluation
### Low Priority
5. **More sophisticated prefilters**
- Teddy algorithm for multi-pattern
- Better false positive filtering
---
## โ
CONCLUSION
**Goal Achieved:** ReXile successfully BEATS the regex crate on version patterns and simple char classes!
**Strengths:**
- ๐ Version patterns: 15-29% faster
- ๐ Digit/Word patterns: 64-73% faster
- ๐ Long text with anchors: 42% faster
**Areas for Improvement:**
- URL patterns with optional quantifiers
- Very long text skip strategy
- Simple literal matching
**Overall Assessment:** ReXile demonstrates that a zero-dependency regex engine can compete with and even beat the highly-optimized regex crate in specific scenarios, particularly for version numbers and simple character class patterns.
---
Generated: January 22, 2026
ReXile Version: 0.1.0
Tests Passing: 75/75