fuzzy-regex
A high-performance fuzzy regular expression engine written in Rust that combines traditional regex constructs with approximate string matching using Damerau-Levenshtein automata and the Bitap algorithm.
Features
- Fuzzy Matching: Match strings with configurable edit distance tolerance (insertions, deletions, substitutions, transpositions)
- Full Regex Support: Character classes, quantifiers, groups, alternation, anchors, lookahead
- Per-Segment Fuzziness: Control fuzziness for individual parts of a pattern
- Capture Groups: Named and numbered capture groups with fuzzy matching
- Similarity Scoring: Get match quality scores (0.0 - 1.0)
- Streaming API: Process large files and network streams incrementally
- High Performance: Bitap algorithm for patterns ≤64 chars, SIMD optimizations
- Unicode Support: Full Unicode support including case-insensitive matching
Installation
Add to your Cargo.toml:
[]
= "0.1"
Feature Flags
simd(default): Enable SIMD optimizations for faster matchingmimalloc: Use mimalloc allocator for better performance
Documentation
MSRV
The minimum supported Rust version is 1.88.
Quick Start
use FuzzyRegex;
// Simple fuzzy matching - allows up to 2 edits
let re = new.unwrap;
assert!; // Exact match
assert!; // 1 deletion
assert!; // 1 insertion
assert!; // 1 substitution
assert!; // 1 transposition
Unicode Support
By default, \w, \d, \s match ASCII characters only. Enable Unicode mode with (?u):
use FuzzyRegexBuilder;
// Without Unicode (ASCII only)
let re1 = new.unwrap;
re1.is_match; // true
re1.is_match; // false (Cyrillic not matched)
// With Unicode mode
let re2 = new.unwrap;
re2.is_match; // true
re2.is_match; // true (Cyrillic matched)
// Or via builder
let re3 = new
.unicode
.build
.unwrap;
Pattern Syntax
Fuzziness Markers
| Syntax | Description |
|---|---|
(?:text){e<=2} |
Allow up to 2 total edits |
(?:text){i<=1} |
Allow up to 1 insertion |
(?:text){d<=1} |
Allow up to 1 deletion |
(?:text){s<=1} |
Allow up to 1 substitution |
(?:text){t<=1} |
Allow up to 1 transposition |
(?:text){i<=1,d<=1,s<=1} |
Separate limits per edit type |
(?:text){e<=1:[a-z]} |
Restrict edits to character class |
(?:text){c<=3} |
Cost-based: total cost ≤ 3 |
(?:text){2i+1d+1s+1t<=4} |
Weighted cost constraint |
text~2 |
Shorthand: allow up to 2 edits |
Standard Regex Constructs
| Syntax | Description |
|---|---|
[a-z], [^0-9] |
Character classes |
\d, \w, \s |
Predefined classes (digit, word, whitespace) |
Note: By default matches ASCII only. Use (?u) for Unicode |
|
\D, \W, \S |
Negated predefined classes |
. |
Any character (except newline by default) |
*, +, ? |
Quantifiers (0+, 1+, 0-1) |
*?, +?, ?? |
Lazy quantifiers |
{n}, {n,}, {n,m} |
Repetition counts |
(group) |
Capture group |
(?:group) |
Non-capturing group |
(?<name>...) |
Named capture group |
a|b |
Alternation |
^, $ |
Start/end anchors |
\b, \B |
Word boundary / non-boundary |
(?=...), (?!...) |
Positive/negative lookahead |
(?<=...), (?<!...) |
Positive/negative lookbehind (fixed-length) |
\1, \2 |
Backreferences |
Inline Flags
| Flag | Description |
|---|---|
(?i) |
Case insensitive |
(?m) |
Multi-line (^ and $ match line boundaries) |
(?s) |
Dot-all (. matches newlines) |
(?x) |
Verbose mode (ignore whitespace, allow comments) |
(?U) |
Ungreedy (swap greedy/lazy behavior) |
(?u) |
Unicode mode (\w, \d, \s match Unicode chars) |
(?b) |
BESTMATCH - find best match within alternatives |
(?e) |
ENHANCEMATCH - enhance match with more edits |
(?p) |
POSIX - find longest match at leftmost position |
Advanced Features
| Syntax | Description |
|---|---|
\K |
Reset match start position |
(?>...) |
Atomic group (no backtracking) |
*+, ++, ?+ |
Possessive quantifiers |
(?R), (?1) |
Recursive patterns |
(?<name>...) |
Named capture group with fuzzy modifier |
\L<name> |
Named list reference (via set_word_list) |
\G |
Match at end of previous match position |
API Methods
find()- Find first matchfind_iter()- Find all matches iterativelyfind_rev()- Find rightmost matchfind_iter_rev()- Find all matches in reverse ordercaptures()- Get capture groups with positionsfullmatch()- Match entire string from start to endis_full_match()- Check if entire string matchessplit()- Split by matchesreplace()- Replace matches with replacement stringfind_with_timeout()- Match with timeout support
Partial Matches
Match incomplete/streaming input with partial matching:
use FuzzyRegexBuilder;
let re = new
.partial
.build
.unwrap;
// Match at end of text - marked as partial
let m = re.find.unwrap;
assert!; // Match reached end of input
Examples
Basic Fuzzy Search
use FuzzyRegexBuilder;
let re = new
.similarity
.build
.unwrap;
// Finds "the" even when misspelled as "teh"
let m = re.find.unwrap;
assert_eq!;
Fuzzy Search with Context
use FuzzyRegex;
// Mix exact and fuzzy matching
let re = new.unwrap;
assert!; // Exact
assert!; // Typo in "quick"
Capture Groups
use FuzzyRegex;
let re = new.unwrap;
let caps = re.captures.unwrap;
assert_eq!;
assert_eq!;
Find All Matches
use FuzzyRegex;
let re = new.unwrap;
let matches: = re.find_iter.collect;
assert_eq!;
assert_eq!;
assert_eq!;
Replace
use FuzzyRegex;
let re = new.unwrap;
// Replace first occurrence
let result = re.replace;
assert_eq!;
// Replace all occurrences
let result = re.replace_all;
assert_eq!;
Streaming API
Process large files or network streams without loading everything into memory:
use FuzzyRegex;
use BufReader;
use File;
let re = new.unwrap;
let mut stream = re.stream;
// Feed data in chunks
for m in stream.feed
for m in stream.feed
// Or process a reader directly
let file = open.unwrap;
for m in re.stream.search_reader
Byte-Level Search
use FuzzyRegex;
let re = new.unwrap;
// Search byte slices directly
if let Some = re.find_bytes
// Check if pattern supports fast streaming
if re.supports_streaming
Case Insensitive Matching
use FuzzyRegexBuilder;
let re = new
.case_insensitive
.build
.unwrap;
assert!;
assert!;
assert!; // Case insensitive + fuzzy
DNA Sequence Matching
use FuzzyRegex;
// Find DNA sequences allowing for sequencing errors
let re = new.unwrap;
let dna = "NNNNACGTACGTNNNN";
let m = re.find.unwrap;
assert_eq!;
// Also matches with up to 2 errors
assert!; // 1 substitution
Character Class Restrictions
Restrict which characters can be used for edits:
use FuzzyRegex;
// Only allow lowercase letters as substitutions
let re = new.unwrap;
assert!; // 'a' is in [a-z]
assert!; // '3' is not in [a-z]
Lookbehind
Match only after a specific pattern:
use FuzzyRegex;
// Positive lookbehind - match "world" only when preceded by "hello "
let re = new.unwrap;
assert!;
assert!;
// Negative lookbehind - match "world" only when NOT preceded by "hello "
let re = new.unwrap;
assert!;
assert!;
// Variable-length lookbehind (fuzzy)
let re = new.unwrap;
assert!; // "hello" with 1 error before "world"
Cost-Based Matching
Assign different costs to edit operations for fine-grained control:
use FuzzyRegex;
// Simple cost: all operations cost 1, total cost <= 2
let re = new.unwrap;
assert!; // 1 sub, cost=1
assert!; // 1 del, cost=1
// Weighted costs: insertions cost 2, others cost 1
let re = new.unwrap;
assert!; // 1 ins, cost=2
assert!; // 1 del, cost=1
assert!; // 1 transposition, cost=1
Performance
Algorithm Selection
The library automatically selects the best algorithm:
- Bitap Algorithm: Used for patterns ≤64 characters. O(n×k) time complexity where n is text length and k is max edits.
- Levenshtein NFA: Used for longer patterns or complex regex features.
- DFA Fast Path: For patterns without capture groups or lazy quantifiers.
Optimization Tips
- Use specific edit limits:
{e<=1}is faster than{e<=5} - Prefer shorter patterns: Bitap is very fast for short patterns
- Use streaming for large texts: Avoids loading entire file into memory
- Enable SIMD: Enabled by default, provides ~2x speedup on supported platforms
- Consider mimalloc: Enable the
mimallocfeature for memory-intensive workloads
API Overview
Main Types
| Type | Description |
|---|---|
FuzzyRegex |
Compiled regex pattern |
FuzzyRegexBuilder |
Builder for customized regex construction |
Match |
A single match with position and similarity |
Captures |
Match with capture group information |
StreamingMatcher |
Stateful matcher for incremental processing |
StreamingMatch |
Match result from streaming search |
Key Methods
// Construction
new // Searching
re.is_match .find .find_iter .captures // Replacing
re.replace .replace_all // Streaming
re.stream .find_bytes .find_iter_bytes // Builder options
builder.similarity
builder.case_insensitive
builder.multi_line
builder.dot_all
Compatibility
This crate provides a compatibility layer for projects migrating from fuzzy-aho-corasick:
use FuzzyAhoCorasickBuilder;
use FuzzyLimits;
let searcher = new
.fuzzy
.build
.unwrap;
for m in searcher.find_iter
License
MIT License