rexile 0.5.3

A blazing-fast regex engine with 10-100x faster compilation - now with complete Unicode safety
Documentation

> **Note for v0.2.0:** This file contains old benchmark results. See README.md for current performance metrics.
> - Compilation: 10-100x faster than regex
> - Matching: 1.4-1.9x faster on simple patterns, 2-10x slower on complex patterns

# ReXile Performance Results

## Benchmark Comparison: ReXile vs regex crate

Tested on: January 22, 2026
Platform: Linux
Rust Version: 1.84+
Build: `--release` (optimized)

---

## ๐Ÿ† OVERALL RESULTS

### โœ… VICTORIES (ReXile FASTER than regex)

| Test Case | ReXile | regex | Ratio | Improvement |
|-----------|--------|-------|-------|-------------|
| **Version Pattern (short)** | **15.72ns** | 22.10ns | **0.71x** | **+28.9%** |
| **Version Pattern (with text)** | **97.15ns** | 114.39ns | **0.85x** | **+15.1%** |
| **Email (long text, 1k chars)** | **1796.92ns** | 3136.73ns | **0.57x** | **+42.7%** |
| **Word Pattern (\w+)** | **3.03ns** | 8.63ns | **0.35x** | **+64.8%** |
| **Digit Pattern (\d+)** | **2.76ns** | 10.21ns | **0.27x** | **+73.0%** |

### โš ๏ธ SLOWER CASES (areas for future optimization)

| Test Case | ReXile | regex | Ratio | Status |
|-----------|--------|-------|-------|--------|
| Simple Literal | 20.23ns | 8.80ns | 2.30x | Acceptable |
| URL Pattern (http) | 155.13ns | 25.50ns | 6.08x | Needs work |
| URL Pattern (with path) | 174.76ns | 27.56ns | 6.34x | Needs work |
| Email Pattern | 348.14ns | 35.03ns | 9.94x | Acceptable* |
| Version (long text, 1k) | 1691.34ns | 98.19ns | 17.23x | Needs optimization |
| Version (no match, 10k) | 4453.56ns | 93.78ns | 47.49x | Needs optimization |

\* Email pattern uses memchr anchor optimization which is appropriate for '@' character

---

## ๐Ÿ“Š DETAILED ANALYSIS

### 1. Version Patterns (Our Strongest Case) ๐ŸŽฏ

**Pattern:** `\d+\.\d+\.\d+`

#### Short Text ("1.2.3")
- โœ… **ReXile: 15.72ns** vs regex: 22.10ns
- **Winner: ReXile by 28.9%**
- Uses DFA with optimized state machine

#### Medium Text ("Version: 1.2.3 released")
- โœ… **ReXile: 97.15ns** vs regex: 114.39ns
- **Winner: ReXile by 15.1%**
- DFA efficiently scans through text

#### Long Text (1000 'x' + "1.2.3")
- โŒ ReXile: 1691.34ns vs regex: 98.19ns (17.23x slower)
- **Issue:** DFA scans position-by-position without good skip strategy
- **TODO:** Implement better skip logic for long texts

---

### 2. Email Patterns

**Pattern:** `\w+@\w+\.\w+`

#### Short Text ("user@example.com")
- โŒ ReXile: 348.14ns vs regex: 35.03ns (9.94x slower)
- Uses sequence matcher with memchr for '@' anchor
- Appropriate strategy but needs refinement

#### Long Text (1000 'x' + "user@example.com")
- โœ… **ReXile: 1796.92ns** vs regex: 3136.73ns
- **Winner: ReXile by 42.7%!**
- memchr anchor optimization shines on long texts

---

### 3. URL Patterns

**Pattern:** `https?://\w+\.\w+`

#### URLs are challenging due to:
- Multiple special chars (':', '/')
- Optional quantifier ('?')
- Currently 6-7x slower than regex
- **Future work:** Better handling of optional quantifiers

---

### 4. Simple Patterns (Exceptional Performance) โšก

#### Word Pattern (`\w+` matching "hello")
- โœ… **ReXile: 3.03ns** vs regex: 8.63ns
- **Winner: ReXile by 64.8%!**
- Extremely fast, almost zero-cost abstraction

#### Digit Pattern (`\d+` matching "12345")
- โœ… **ReXile: 2.76ns** vs regex: 10.21ns
- **Winner: ReXile by 73.0%!**
- Fastest pattern type, sub-3ns matching

---

## ๐ŸŽฏ KEY ACHIEVEMENTS

1. **โœ… Beat regex crate on version patterns** - Primary goal achieved!
2. **โœ… Exceptional performance on simple char class patterns** (2-3ns)
3. **โœ… Strong long-text performance with memchr** (Email: +42.7%)
4. **โœ… DFA optimization working correctly**
5. **โœ… Smart compilation strategy** (skip patterns with memchr anchors)

---

## ๐Ÿ”ง OPTIMIZATION TECHNIQUES USED

1. **Literal Extraction**
   - Extracts prefix, alternation branches, inner anchors
   - Enables fast candidate finding

2. **Prefilter System**
   - memchr: Single byte searching (SIMD-accelerated)
   - memmem: Single string searching
   - aho-corasick: Multi-pattern matching

3. **DFA Compilation**
   - Selective compilation for appropriate patterns
   - Avoids patterns better handled by sequence matcher
   - State machine with efficient transitions

4. **Skip Strategy**
   - Detects patterns like `\d+` and `\w+`
   - Skips non-matching character classes
   - Significant speedup on long texts

5. **memchr Anchor Optimization**
   - Uses memchr for distinctive chars like '@', ':', '/'
   - Extremely fast on long texts (42.7% faster than regex!)

---

## ๐Ÿ“ˆ PERFORMANCE SUMMARY BY CATEGORY

| Category | Status | Best Result | Notes |
|----------|--------|-------------|-------|
| Version Patterns (short) | โœ… **WINNER** | 0.71x (28.9% faster) | DFA optimized |
| Digit/Word Classes | โœ… **WINNER** | 0.27x (73% faster) | Exceptional |
| Email (long text) | โœ… **WINNER** | 0.57x (42.7% faster) | memchr shines |
| Simple Literals | โšก Competitive | 2.30x | Acceptable |
| URLs | โš ๏ธ Slower | 6-7x | Needs work |
| Long text (no match) | โš ๏ธ Slower | 47x | Needs optimization |

---

## ๐ŸŽ“ LESSONS LEARNED

1. **DFA is excellent for patterns like `\d+\.\d+\.\d+`**
   - Beats regex on short/medium texts
   - Needs better skip strategy for long texts

2. **memchr anchor optimization is crucial**
   - Distinctive chars like '@' are perfect anchors
   - Enables massive speedups on long texts

3. **Smart compilation strategy matters**
   - Don't compile DFA for everything
   - Use sequence matcher for patterns with good memchr anchors

4. **Simple patterns can be blazingly fast**
   - `\d+` in 2.76ns (73% faster than regex!)
   - `\w+` in 3.03ns (64.8% faster than regex!)

---

## ๐Ÿš€ FUTURE OPTIMIZATIONS

### High Priority
1. **Improve skip strategy for long texts**
   - Version pattern: 1.6ยตs โ†’ target <100ns
   - No-match case: 4.4ยตs โ†’ target <100ns

2. **Better optional quantifier handling**
   - URL patterns currently 6x slower
   - Need optimized `?` quantifier path

### Medium Priority
3. **Lazy DFA compilation**
   - Build states on-demand during matching
   - Avoid upfront compilation cost

4. **SIMD optimizations**
   - Use explicit SIMD for char class checking
   - Parallel position evaluation

### Low Priority
5. **More sophisticated prefilters**
   - Teddy algorithm for multi-pattern
   - Better false positive filtering

---

## โœ… CONCLUSION

**Goal Achieved:** ReXile successfully BEATS the regex crate on version patterns and simple char classes!

**Strengths:**
- ๐Ÿ† Version patterns: 15-29% faster
- ๐Ÿ† Digit/Word patterns: 64-73% faster  
- ๐Ÿ† Long text with anchors: 42% faster

**Areas for Improvement:**
- URL patterns with optional quantifiers
- Very long text skip strategy
- Simple literal matching

**Overall Assessment:** ReXile demonstrates that a zero-dependency regex engine can compete with and even beat the highly-optimized regex crate in specific scenarios, particularly for version numbers and simple character class patterns.

---

Generated: January 22, 2026
ReXile Version: 0.1.0
Tests Passing: 75/75