Regex Specificity
A heuristic-based crate to calculate the specificity of a regular expression pattern against a specific string.
Concept
Specificity measures how "precise" a match is. For example, the pattern abc is more specific to the string "abc" than the pattern a.c or .*.
The calculation follows these principles:
- Positional Weighting: Earlier matches contribute more to the total specificity than later ones.
- Certainty: Literals (exact characters) specificity higher than character classes or wildcards.
- Information Density: Narrower character classes (e.g.,
[a-z]) specificity higher than broader ones (e.g.,.). - Branching Penalty: Patterns with many alternatives (alternations) are penalized as they are less specific.
Usage
The get function assumes that the string provided is already a full match for the pattern.
If the pattern does not match the string, the resulting specificity will be mathematically inconsistent and meaningless for comparison purposes.
let string = "abc";
let high = get.unwrap;
let low = get.unwrap;
assert!;
Counterintuitive
Since this crate uses a greedy heuristic based on the HIR (High-level Intermediate Representation), certain patterns may yield the same specificity even if they look different.
A common example is when a wildcard .* "swallows" the entire string before other parts of the pattern can be evaluated.
let string = "cat";
let pattern1 = r".*";
let pattern2 = r".*a.*";
assert_eq!
Recommendation
If you need to distinguish between patterns with identical specificity, we recommend using the pattern length as a secondary tie-breaker:
- Mathematical: A longer pattern is often less specific because it requires more redundant components to describe the same set.
- Intuitive: You may prefer the pattern with more literals.
if result_a == result_b
License
This project is licensed under the MIT License © 2025 557.