Expand description
§regexr
A regex engine built for tokenization and LLM text processing.
§When to Use
- Use
regexfor general-purpose regex - Use
regexrwhen building tokenizers or pipelines that need lookarounds with performance
§Features
- Lookarounds:
(?=...),(?!...),(?<=...),(?<!...) - Unicode properties:
\p{L},\p{N},\p{M}, etc. - SIMD acceleration: AVX2, SSSE3, WASM v128
- JIT compilation: Cranelift backend (native targets)
- ReDoS protection: Bounded execution via memoization
§Quick Start
use regexr::Regex;
let re = Regex::new(r"\d{3}-\d{2}-\d{4}").unwrap();
assert!(re.is_match("123-45-6789"));
// Lookahead
let re = Regex::new(r"foo(?=bar)").unwrap();
assert!(re.is_match("foobar"));
// Find all matches
for m in re.find_iter("foobar foobaz") {
println!("Found: {}", m.as_str());
}
// Capture groups
let re = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
if let Some(caps) = re.captures("user@example.com") {
println!("User: {}, Domain: {}", &caps[1], &caps[2]);
}§Feature Flags
simd: Native SIMD (AVX2/SSSE3)jit: JIT compilation via Cranelift (native only)wasm-simd: WASM SIMD (v128)wasm-slim: Minimal WASM buildadvanced-cache: Advanced LRU cache with moka (high-concurrency scenarios)parallel: Parallel execution with rayonfull: All optimizations
Modules§
- analyzer
- Pattern analysis and classification.
- backtrack
- Backtracking regex engine (Layer 2).
- bytes
- Bytes-based regex matching for binary data and LLM tokenization.
- bytes_
factory - Engine factory for bytes-mode regex execution.
- cache
- Pluggable regex caching.
- engine
- Multi-engine regex execution system.
- parser
- PCRE pattern parser.
- simd
- SIMD-accelerated scanning.
- util
- Utility functions and types.
Structs§
- Captures
- Capture groups from a match.
- Error
- An error that occurred during regex compilation or matching.
- Match
- A single match in the input text.
- Regex
- A compiled regular expression.
- Regex
Builder - A builder for configuring and compiling a regex.
Enums§
- Engine
Choice - Engine selection for regex compilation.
- Error
Kind - The kind of regex error.
Functions§
- compile
- Compile a regex pattern with default options.
- escape
- Escape all regex metacharacters in a string.
- is_
match - Check if a pattern matches anywhere in the input.
Type Aliases§
- Result
- A specialized Result type for regexr operations.