Module glob

Module glob 

Source
Available on (crate features syntax-glob or syntax-ev or syntax-regex) and crate feature syntax-glob only.
Expand description

glob()-style (wildcard) pattern matching syntax support.

Supported syntax:

The following examples match glob syntax using ib_matcher::regex engines.

§Example

// cargo add ib-matcher --features syntax-glob,regex
use ib_matcher::{regex::lita::Regex, syntax::glob::{parse_wildcard_path, PathSeparator}};

let re = Regex::builder()
    .build_from_hir(
        parse_wildcard_path()
            .separator(PathSeparator::Windows)
            .call(r"Win*\*\*.exe"),
    )
    .unwrap();
assert!(re.is_match(r"C:\Windows\System32\notepad.exe"));

let re = Regex::builder()
    .build_from_hir(
        parse_wildcard_path()
            .separator(PathSeparator::Windows)
            .call(r"Win**.exe"),
    )
    .unwrap();
assert!(re.is_match(r"C:\Windows\System32\notepad.exe"));

§With IbMatcher

use ib_matcher::{
    matcher::MatchConfig,
    regex::lita::Regex,
    syntax::glob::{parse_wildcard_path, PathSeparator}
};

let re = Regex::builder()
    .ib(MatchConfig::builder().pinyin(Default::default()).build())
    .build_from_hir(
        parse_wildcard_path()
            .separator(PathSeparator::Windows)
            .call(r"win**pyss.exe"),
    )
    .unwrap();
assert!(re.is_match(r"C:\Windows\System32\拼音搜索.exe"));

§Anchor modes

There are four possible anchor modes:

  • Matching from the start of the string. Used by terminal auto completion.
  • Matching from anywhere in the string. Used by this module.
  • Matching to the end of the string. Rarely used besides matching file extensions.
  • Matching the whole string (from the start to the end). Used by voidtools’ Everything.

This module will match from anywhere in the string by default. For other modes:

  • To match from the start of the string only, you can append a * to the pattern (like foo*), which will then be consider as an anchor (by surrounding_wildcard_as_anchor).
  • To match the whole string only, you can combine the above one with checking the returned match length at the moment.
  • If you want to match to the end of the string, prepend a *, like *.mp4.

§Surrounding wildcards as anchors

TL;DR: When not matching the whole string, enabling surrounding_wildcard_as_anchor let patterns like *.mp4 matches v.mp4 but not v.mp4_0.webp (it matches both if disabled). And it’s enabled by default.

Besides matching the whole string, other anchor modes can have some duplicate patterns. For example, when matching from anywhere, *.mp4 will match the same strings matched by .mp4; when matching from the start, foo* is the same as foo.

These duplicate patterns have no syntax error, but matching them literally probably isn’t what the user want. For example, *.mp4 actually means the match must be to the end, foo* actually means the match must be from the start, otherwise the user would just type .mp4 or foo. And the formers also cause worse match highlight (hightlighting the whole string isn’t useful).

To fix these problems, one way is to only match the whole string, another way is to treat leading and trailing wildcards differently. The user-side difference of them is how patterns like a*b are treated: the former requires ^a.*b$, the latter allows ^.*a.*b.*$ (*a*b* in the former). The latter is more user-friendly (in my option) and can be converted to the former by adding anchor modes, so it’s implemented here: surrounding_wildcard_as_anchor, enabled by default.

Related issue: IbEverythingExt #98

§Anchors in file paths

TL;DR: If you are matching file paths, you probably want to set Regex::builder().thompson(PathSeparator::Windows.look_matcher_config()).

Another problem about anchored matching is, when matching file paths, should the anchors match the start/end of the whole path or the path components (i.e. match separators)?

The default behavior is the former, for example:

use ib_matcher::{
    matcher::MatchConfig,
    regex::lita::Regex,
    syntax::glob::{parse_wildcard_path, PathSeparator}
};

let re = Regex::builder()
    .ib(MatchConfig::default())
    .build_from_hir(
        parse_wildcard_path()
            .separator(PathSeparator::Windows)
            .call(r"?\foo*\"),
    )
    .unwrap();
assert!(re.is_match(r"C\foobar\⑨"));
assert!(re.is_match(r"D\C\foobar\9") == false); // Doesn't match
assert!(re.is_match(r"DC\foobar\9") == false);
assert!(re.is_match(r"C\DC\foobar\9") == false);

If you want the latter behavior, i.e. special anchors that match / or \ too, you need to set look_matcher in crate::regex::nfa::thompson::Config, for example:

use ib_matcher::{
    matcher::MatchConfig,
    regex::lita::Regex,
    syntax::glob::{parse_wildcard_path, PathSeparator}
};

let re = Regex::builder()
    .ib(MatchConfig::default())
    .thompson(PathSeparator::Windows.look_matcher_config())
    .build_from_hir(
        parse_wildcard_path()
            .separator(PathSeparator::Windows)
            .call(r"?\foo*\"),
    )
    .unwrap();
assert!(re.is_match(r"C\foobar\⑨"));
assert!(re.is_match(r"D\C\foobar\9")); // Now matches
assert!(re.is_match(r"DC\foobar\9") == false);
assert!(re.is_match(r"C\DC\foobar\9") == false);

The latter behavior is used by voidtools’ Everything.

Related issue: IbEverythingExt #99

§Character classes

Support patterns like [abc], [a-z], [!a-z] and [[:ascii:]].

Character classes can be used to escape the metacharacter: [?], [*], [[], []] match the literal characters ?, *, [, ] respectively.

§Error behavior

Parsing of [] is fallible: patterns like a[b are invalid.

At the moment related characters will be treated as literal characters if parsing fails.

§Examples

// Set
assert!(is_match("a[b]z", "abz"));
assert!(is_match("a[b]z", "aBz") == false);
assert!(is_match("a[bcd]z", "acz"));

// Range
assert!(is_match("a[b-z]z", "ayz"));

// Negative set
assert!(is_match("a[!b]z", "abz") == false);
assert!(is_match("a[!b]z", "acz"));

// ASCII character class
assert!(is_match("a[[:space:]]z", "a z"));

// Escape
assert!(is_match("a[?]z", "a?z"));
assert!(is_match("a[*]z", "a*z"));
assert!(is_match("a[[]z", "a[z"));
assert!(is_match("a[-]z", "a-z"));
assert!(is_match("a[]]z", "a]z"));
assert!(is_match(r"a[\d]z", r"a\z"));

// Invalid patterns
assert!(is_match("a[b", "a[bz"));
assert!(is_match("a[[b]z", "a[[b]z"));
assert!(is_match("a[!]z", "a[!]z"));

Structs§

GlobExtConfig
Support two seperators (//) or a complement separator (\) as a glob star (*/**).
GlobExtConfigBuilder
Use builder syntax to set the inputs and finish with build().
ParseGlobPathBuilder
Use builder syntax to set the inputs and finish with call().
ParseWildcardBuilder
Use builder syntax to set the inputs and finish with call().
ParseWildcardPathBuilder
Use builder syntax to set the inputs and finish with call().

Enums§

GlobPathToken
See parse_glob_path.
GlobStar
PathSeparator
Defaults to PathSeparator::Os, i.e. / on Unix and \ on Windows.
WildcardPathToken
See parse_wildcard_path.
WildcardToken
See parse_wildcard.

Functions§

parse_glob_path
glob path syntax flavor, including ?, *, [] and **.
parse_wildcard
Wildcard-only glob syntax flavor, including ? and *.
parse_wildcard_path
Wildcard-only path glob syntax flavor, including ?, * and **.