syntax-glob
or syntax-ev
or syntax-regex
) and crate feature syntax-glob
only.Expand description
glob()-style (wildcard) pattern matching syntax support.
Supported syntax:
-
parse_wildcard
:?
and*
.- Windows file name safe.
-
parse_wildcard_path
:?
,*
and**
, optionally withGlobExtConfig
.- Windows file name safe.
Used by voidtools’ Everything, etc.
-
parse_glob_path
:?
,*
,[]
and**
, optionally withGlobExtConfig
.- Parsing of
[]
is fallible. - Not Windows file name safe:
[]
may disturb the matching of literal[]
in file names.
- Parsing of
-
GlobExtConfig
: Two seperators (//
) or a complement separator (\
) as a glob star (*/**
).
The following examples match glob syntax using ib_matcher::regex
engines.
§Example
// cargo add ib-matcher --features syntax-glob,regex
use ib_matcher::{regex::lita::Regex, syntax::glob::{parse_wildcard_path, PathSeparator}};
let re = Regex::builder()
.build_from_hir(
parse_wildcard_path()
.separator(PathSeparator::Windows)
.call(r"Win*\*\*.exe"),
)
.unwrap();
assert!(re.is_match(r"C:\Windows\System32\notepad.exe"));
let re = Regex::builder()
.build_from_hir(
parse_wildcard_path()
.separator(PathSeparator::Windows)
.call(r"Win**.exe"),
)
.unwrap();
assert!(re.is_match(r"C:\Windows\System32\notepad.exe"));
§With IbMatcher
use ib_matcher::{
matcher::MatchConfig,
regex::lita::Regex,
syntax::glob::{parse_wildcard_path, PathSeparator}
};
let re = Regex::builder()
.ib(MatchConfig::builder().pinyin(Default::default()).build())
.build_from_hir(
parse_wildcard_path()
.separator(PathSeparator::Windows)
.call(r"win**pyss.exe"),
)
.unwrap();
assert!(re.is_match(r"C:\Windows\System32\拼音搜索.exe"));
§Anchor modes
There are four possible anchor modes:
- Matching from the start of the string. Used by terminal auto completion.
- Matching from anywhere in the string. Used by this module.
- Matching to the end of the string. Rarely used besides matching file extensions.
- Matching the whole string (from the start to the end). Used by voidtools’ Everything.
This module will match from anywhere in the string by default. For other modes:
- To match from the start of the string only, you can append a
*
to the pattern (likefoo*
), which will then be consider as an anchor (bysurrounding_wildcard_as_anchor
). - To match the whole string only, you can combine the above one with checking the returned match length at the moment.
- If you want to match to the end of the string, prepend a
*
, like*.mp4
.
§Surrounding wildcards as anchors
TL;DR: When not matching the whole string, enabling
surrounding_wildcard_as_anchor
let patterns like*.mp4
matchesv.mp4
but notv.mp4_0.webp
(it matches both if disabled). And it’s enabled by default.
Besides matching the whole string, other anchor modes can have some duplicate patterns. For example, when matching from anywhere, *.mp4
will match the same strings matched by .mp4
; when matching from the start, foo*
is the same as foo
.
These duplicate patterns have no syntax error, but matching them literally probably isn’t what the user want. For example, *.mp4
actually means the match must be to the end, foo*
actually means the match must be from the start, otherwise the user would just type .mp4
or foo
. And the formers also cause worse match highlight (hightlighting the whole string isn’t useful).
To fix these problems, one way is to only match the whole string, another way is to treat leading and trailing wildcards differently. The user-side difference of them is how patterns like a*b
are treated: the former requires ^a.*b$
, the latter allows ^.*a.*b.*$
(*a*b*
in the former). The latter is more user-friendly (in my option) and can be converted to the former by adding anchor modes, so it’s implemented here: surrounding_wildcard_as_anchor
, enabled by default.
Related issue: IbEverythingExt #98
§Anchors in file paths
TL;DR: If you are matching file paths, you probably want to set
Regex::builder().thompson(PathSeparator::Windows.look_matcher_config())
.
Another problem about anchored matching is, when matching file paths, should the anchors match the start/end of the whole path or the path components (i.e. match separators)?
The default behavior is the former, for example:
use ib_matcher::{
matcher::MatchConfig,
regex::lita::Regex,
syntax::glob::{parse_wildcard_path, PathSeparator}
};
let re = Regex::builder()
.ib(MatchConfig::default())
.build_from_hir(
parse_wildcard_path()
.separator(PathSeparator::Windows)
.call(r"?\foo*\"),
)
.unwrap();
assert!(re.is_match(r"C\foobar\⑨"));
assert!(re.is_match(r"D\C\foobar\9") == false); // Doesn't match
assert!(re.is_match(r"DC\foobar\9") == false);
assert!(re.is_match(r"C\DC\foobar\9") == false);
If you want the latter behavior, i.e. special anchors that match /
or \
too, you need to set look_matcher
in crate::regex::nfa::thompson::Config
, for example:
use ib_matcher::{
matcher::MatchConfig,
regex::lita::Regex,
syntax::glob::{parse_wildcard_path, PathSeparator}
};
let re = Regex::builder()
.ib(MatchConfig::default())
.thompson(PathSeparator::Windows.look_matcher_config())
.build_from_hir(
parse_wildcard_path()
.separator(PathSeparator::Windows)
.call(r"?\foo*\"),
)
.unwrap();
assert!(re.is_match(r"C\foobar\⑨"));
assert!(re.is_match(r"D\C\foobar\9")); // Now matches
assert!(re.is_match(r"DC\foobar\9") == false);
assert!(re.is_match(r"C\DC\foobar\9") == false);
The latter behavior is used by voidtools’ Everything.
Related issue: IbEverythingExt #99
§Character classes
Support patterns like [abc]
, [a-z]
, [!a-z]
and [[:ascii:]]
.
Character classes can be used to escape the metacharacter: [?]
, [*]
, [[]
, []]
match the literal characters ?
, *
, [
, ]
respectively.
§Error behavior
Parsing of []
is fallible: patterns like a[b
are invalid.
At the moment related characters will be treated as literal characters if parsing fails.
§Examples
// Set
assert!(is_match("a[b]z", "abz"));
assert!(is_match("a[b]z", "aBz") == false);
assert!(is_match("a[bcd]z", "acz"));
// Range
assert!(is_match("a[b-z]z", "ayz"));
// Negative set
assert!(is_match("a[!b]z", "abz") == false);
assert!(is_match("a[!b]z", "acz"));
// ASCII character class
assert!(is_match("a[[:space:]]z", "a z"));
// Escape
assert!(is_match("a[?]z", "a?z"));
assert!(is_match("a[*]z", "a*z"));
assert!(is_match("a[[]z", "a[z"));
assert!(is_match("a[-]z", "a-z"));
assert!(is_match("a[]]z", "a]z"));
assert!(is_match(r"a[\d]z", r"a\z"));
// Invalid patterns
assert!(is_match("a[b", "a[bz"));
assert!(is_match("a[[b]z", "a[[b]z"));
assert!(is_match("a[!]z", "a[!]z"));
Structs§
- Glob
ExtConfig - Support two seperators (
//
) or a complement separator (\
) as a glob star (*/**
). - Glob
ExtConfig Builder - Use builder syntax to set the inputs and finish with
build()
. - Parse
Glob Path Builder - Use builder syntax to set the inputs and finish with
call()
. - Parse
Wildcard Builder - Use builder syntax to set the inputs and finish with
call()
. - Parse
Wildcard Path Builder - Use builder syntax to set the inputs and finish with
call()
.
Enums§
- Glob
Path Token - See
parse_glob_path
. - Glob
Star - Path
Separator - Defaults to
PathSeparator::Os
, i.e./
on Unix and\
on Windows. - Wildcard
Path Token - See
parse_wildcard_path
. - Wildcard
Token - See
parse_wildcard
.
Functions§
- parse_
glob_ path - glob path syntax flavor, including
?
,*
,[]
and**
. - parse_
wildcard - Wildcard-only glob syntax flavor, including
?
and*
. - parse_
wildcard_ path - Wildcard-only path glob syntax flavor, including
?
,*
and**
.