rexile 0.1.1

A blazing-fast regex engine with JIT-style optimizations - competitive performance (1.03x vs regex) with 15x less memory
Documentation

ReXile ๐ŸฆŽ

Crates.io Documentation License: MIT OR Apache-2.0

A blazing-fast regex engine with JIT-style optimizations and minimal dependencies

ReXile is a zero-dependency regex alternative (no regex crate!) that achieves competitive performance through intelligent fast paths:

  • โšก Performance-competitive with regex crate - Within 3% on real-world workloads
  • ๐Ÿง  15x less memory for pattern compilation - Minimal metadata overhead
  • ๐Ÿš€ 21x faster pattern compilation - Critical for dynamic patterns
  • ๐Ÿ“ฆ Only 2 dependencies - memchr and aho-corasick for SIMD primitives
  • ๐ŸŽฏ 10 specialized fast paths - JIT-style optimizations without JIT complexity
  • ๐Ÿ”ง Full control - Custom optimizations for parsers, lexers, and rule engines

Key Features:

  • โœ… Literal searches with SIMD acceleration
  • โœ… Multi-pattern matching (alternations)
  • โœ… Character classes with negation
  • โœ… Quantifiers (*, +, ?)
  • โœ… Escape sequences (\d, \w, \s, etc.)
  • โœ… Sequences and groups
  • โœ… Word boundaries (\b, \B)
  • โœ… Anchoring (^, $)

๐ŸŽฏ Purpose

ReXile is a production-ready regex engine built from scratch for maximum performance and minimal overhead:

  • ๐ŸŽฏ Competitive performance - 1.03x aggregate ratio vs regex crate on real workloads
  • โšก JIT-style optimizations - 10 specialized fast paths for common patterns
  • ๐Ÿ“ฆ Minimal dependencies - Only memchr + aho-corasick for SIMD primitives
  • ๐Ÿš€ Lightning-fast compilation - 21x faster than regex crate
  • ๐Ÿ’พ Memory efficient - 15x less compilation memory, 5x less peak memory
  • ๐Ÿ”ง Full control - Custom optimizations for specific use cases

Performance Highlights

Real-World GRL Benchmark (6 patterns ร— 41 files):

  • Pattern \d+: 3.57x faster than regex (41/41 wins)
  • Pattern "[^"]+": 2.44x faster than regex (41/41 wins)
  • Pattern rule\s+: 1.05x faster than regex
  • Aggregate: 1.03x (within 3% of regex - competitive!)

Memory Comparison:

  • Compilation: 15x less memory (128 KB vs 1920 KB)
  • Compilation time: 21x faster (370ยตs vs 7.89ms)
  • Peak memory: 5x less in stress tests (0.12 MB vs 0.62 MB)
  • Search operations: Equal memory efficiency

When to Use ReXile:

  • โœ… Parsers & lexers (fast token matching)
  • โœ… Rule engines (business logic pattern matching)
  • โœ… Log processing (keyword search)
  • โœ… Dynamic patterns (21x faster compilation)
  • โœ… Memory-constrained environments (15x less memory)
  • โœ… Low-latency applications (competitive performance)

๐Ÿš€ Quick Start

use rexile::Pattern;

// Literal matching with SIMD acceleration
let pattern = Pattern::new("hello").unwrap();
assert!(pattern.is_match("hello world"));
assert_eq!(pattern.find("say hello"), Some((4, 9)));

// Multi-pattern matching (aho-corasick fast path)
let multi = Pattern::new("foo|bar|baz").unwrap();
assert!(multi.is_match("the bar is open"));

// Digit matching (DigitRun fast path - 3.57x faster than regex!)
let digits = Pattern::new("\\d+").unwrap();
let matches = digits.find_all("Order #12345 costs $67.89");
// Returns: [(7, 12), (20, 22), (23, 25)]

// Identifier matching (IdentifierRun fast path)
let ident = Pattern::new("[a-zA-Z_]\\w*").unwrap();
assert!(ident.is_match("variable_name_123"));

// Quoted strings (QuotedString fast path - 2.44x faster!)
let quoted = Pattern::new("\"[^\"]+\"").unwrap();
assert!(quoted.is_match("say \"hello world\""));

// Word boundaries
let word = Pattern::new("\\btest\\b").unwrap();
assert!(word.is_match("this is a test"));
assert!(!word.is_match("testing"));

// Anchors
let exact = Pattern::new("^hello$").unwrap();
assert!(exact.is_match("hello"));
assert!(!exact.is_match("hello world"));

Cached API (Recommended for Hot Paths)

For patterns used repeatedly in hot loops:

use rexile;

// Automatically cached - compile once, reuse forever
assert!(rexile::is_match("test", "this is a test").unwrap());
assert_eq!(rexile::find("world", "hello world").unwrap(), Some((6, 11)));

// Perfect for parsers and lexers
for line in log_lines {
    if rexile::is_match("ERROR", line).unwrap() {
        // handle error
    }
}

โœจ Supported Features

Fast Path Optimizations (10 Types)

ReXile uses JIT-style specialized implementations for common patterns:

Fast Path Pattern Example Performance vs regex
Literal "hello" Competitive (SIMD)
LiteralPlusWhitespace "rule " Competitive
DigitRun \d+ 3.57x faster โœจ
IdentifierRun [a-zA-Z_]\w* 2520x faster (vs general)
QuotedString "[^"]+" 2.44x faster โœจ
WordRun \w+ Competitive
Alternation foo|bar|baz 2x slower (acceptable)
LiteralWhitespaceQuoted Complex Competitive
LiteralWhitespaceDigits Complex Competitive

Regex Features

Feature Example Status
Literal strings hello, world โœ… Supported
Alternation foo|bar|baz โœ… Supported (aho-corasick)
Start anchor ^start โœ… Supported
End anchor end$ โœ… Supported
Exact match ^exact$ โœ… Supported
Character classes [a-z], [0-9], [^abc] โœ… Supported
Quantifiers *, +, ? โœ… Supported
Escape sequences \d, \w, \s, \., \n, \t โœ… Supported
Sequences ab+c*, \d+\w* โœ… Supported
Groups (abc), (?:...) โœ… Supported
Word boundaries \b, \B โœ… Supported
Bounded quantifiers {n}, {n,m} ๐Ÿšง Planned
Capturing groups Extract (group) ๐Ÿšง Planned
Lookahead/lookbehind (?=...), (?<=...) ๐Ÿšง Planned
Backreferences \1, \2 ๐Ÿšง Planned

๏ฟฝ Performance Benchmarks

Real-World GRL Benchmark

Testing 6 realistic patterns across 41 GRL files (total ~139KB):

Pattern Description Performance Result
\d+ Digit sequences 0.28x 3.57x faster โœจ
"[^"]+" Quoted strings 0.41x 2.44x faster โœจ
rule\s+ Rule keyword 0.95x 5% faster
salience\s+\d+ Salience declarations 1.10x Competitive
query\s+ Query keyword (sparse) 1.44x Expected loss
when|then Alternation 1.99x 2x slower (acceptable)
AGGREGATE All patterns 1.03x Within 3% of regex! โœ…

Perfect Performance (82/82 wins):

  • Digit patterns: 41/41 wins (3.57x faster)
  • Quoted strings: 41/41 wins (2.44x faster)

Memory Comparison

Test 1: Pattern Compilation (10 patterns):

  • regex: 1920 KB in 7.89ms
  • ReXile: 128 KB in 370ยตs
  • Result: 15x less memory, 21x faster โœจ

Test 2: Search Operations (5 patterns ร— 139KB corpus):

  • Both: 0 bytes memory delta
  • Result: Equal efficiency โœ…

Test 3: Stress Test (50 patterns ร— 500KB corpus):

  • regex: 0.62 MB peak in 46ms
  • ReXile: 0.12 MB peak in 27ms
  • Result: 5x less peak memory, 1.7x faster โœจ

When ReXile Wins

โœ… Digit sequences (\d+) - 3.57x faster โœ… Quoted strings ("[^"]+") - 2.44x faster
โœ… Word runs (\w+) - Competitive โœ… Identifiers ([a-zA-Z_]\w*) - 2520x faster than general matcher โœ… Pattern compilation - 21x faster โœ… Memory usage - 15x less for compilation, 5x less peak

When regex Wins

โš ๏ธ Alternations (when|then) - ReXile 2x slower (trade-off for simplicity) โš ๏ธ Sparse matches (query\s+) - ReXile 1.44x slower (expected)

Architecture

Pattern โ†’ Parser โ†’ AST โ†’ Fast Path Detection โ†’ Specialized Matcher
                                                        โ†“
                                     DigitRun (memchr SIMD scanning)
                                     IdentifierRun (direct byte scanning)
                                     QuotedString (memchr + validation)
                                     Alternation (aho-corasick automaton)
                                     Literal (memchr SIMD)
                                     ... 5 more fast paths

Run benchmarks yourself:

cargo run --release --example per_file_grl_benchmark
cargo run --release --example memory_comparison

๐Ÿ“ฆ Installation

Add to your Cargo.toml:

[dependencies]
rexile = "0.1"

๐ŸŽ“ Examples

Literal Search

let p = Pattern::new("needle").unwrap();
assert!(p.is_match("needle in a haystack"));
assert_eq!(p.find("where is the needle?"), Some((13, 19)));

// Find all occurrences
let matches = p.find_all("needle and needle");
assert_eq!(matches, vec![(0, 6), (11, 17)]);

Multi-Pattern (Alternation)

// Fast multi-pattern search using aho-corasick
let keywords = Pattern::new("import|export|function|class").unwrap();
assert!(keywords.is_match("export default function"));

Anchored Patterns

// Must start with pattern
let starts = Pattern::new("^Hello").unwrap();
assert!(starts.is_match("Hello World"));
assert!(!starts.is_match("Say Hello"));

// Must end with pattern
let ends = Pattern::new("World$").unwrap();
assert!(ends.is_match("Hello World"));
assert!(!ends.is_match("World Peace"));

// Exact match
let exact = Pattern::new("^exact$").unwrap();
assert!(exact.is_match("exact"));
assert!(!exact.is_match("not exact"));

Cached API (Best for Repeated Patterns)

// First call compiles and caches
rexile::is_match("keyword", "find keyword here").unwrap();

// Subsequent calls reuse cached pattern (zero compile cost)
rexile::is_match("keyword", "another keyword").unwrap();
rexile::is_match("keyword", "more keyword text").unwrap();

๐Ÿ“š More examples: See examples/ directory for:

Run examples with:

cargo run --example basic_usage
cargo run --example log_processing

๐Ÿ”ง Use Cases

ReXile is production-ready for:

โœ… Ideal Use Cases

  • Parsers and lexers - 21x faster pattern compilation, competitive matching
  • Rule engines - Simple pattern matching in business rules (original use case!)
  • Log processing - Fast keyword and pattern extraction
  • Dynamic patterns - Applications that compile patterns at runtime
  • Memory-constrained environments - 15x less compilation memory
  • Low-latency applications - Predictable performance, no JIT warmup

๐ŸŽฏ Perfect Patterns for ReXile

  • Digit extraction: \d+ (3.57x faster!)
  • Quoted strings: "[^"]+" (2.44x faster!)
  • Identifiers: [a-zA-Z_]\w* (2520x faster than general matcher!)
  • Word runs: \w+
  • Keyword search: rule\s+, function\s+

โš ๏ธ Consider regex crate for

  • Complex alternations (ReXile 2x slower)
  • Very sparse patterns (ReXile up to 1.44x slower)
  • Unicode properties (\p{L} - not yet supported)
  • Advanced features (lookahead, backreferences - not yet supported)

๐Ÿค Contributing

Contributions welcome! ReXile is actively maintained and evolving.

Current focus:

  • โœ… Core regex features complete
  • โœ… 10 fast path optimizations implemented
  • โœ… Production-ready performance (1.03x aggregate vs regex)
  • ๐Ÿ”„ Advanced features: bounded quantifiers {n,m}, capturing groups, lookahead

How to contribute:

  1. Check issues for open tasks
  2. Run tests: cargo test
  3. Run benchmarks: cargo run --release --example per_file_grl_benchmark
  4. Submit PR with benchmarks showing performance impact

Priority areas:

  • ๏ฟฝ Bounded quantifiers ({n}, {n,m})
  • ๐Ÿ“‹ Capturing group extraction
  • ๐Ÿ“‹ More fast path patterns
  • ๐Ÿ“‹ Unicode support
  • ๐Ÿ“‹ Documentation improvements

๐Ÿ“œ License

Licensed under either of:

at your option.

๐Ÿ™ Credits

Built on top of:

  • memchr by Andrew Gallant - SIMD-accelerated substring search
  • aho-corasick by Andrew Gallant - Multi-pattern matching automaton

Developed for the rust-rule-engine project, providing fast pattern matching for GRL (Grule Rule Language) parsing and business rule evaluation.

Performance Philosophy: ReXile achieves competitive performance through intelligent specialization rather than complex JIT compilation:

  • 10 hand-optimized fast paths for common patterns
  • SIMD acceleration via memchr
  • Pre-built automatons for alternations
  • Zero-copy iterator design
  • Minimal metadata overhead

Status: โœ… Production Ready (v0.1.0)

  • โœ… Performance: 1.03x aggregate vs regex (within 3%)
  • โœ… Memory: 15x less compilation, 5x less peak
  • โœ… Features: All core regex features working
  • โœ… Testing: 77 unit tests passing, comprehensive benchmarks
  • โœ… Real-world validated: GRL parsing, rule engines, log processing