ib-matcher 0.4.0

A multilingual, flexible and fast string, glob and regex matcher. Support 拼音匹配 (Chinese pinyin match) and ローマ字検索 (Japanese romaji match).
Documentation

ib-matcher

crates.io Documentation License

A multilingual, flexible and fast string, glob and regex matcher. Support 拼音匹配 (Chinese pinyin match) and ローマ字検索 (Japanese romaji match).

Features

And all of the above features are optional. You don't need to pay the performance and binary size cost for features you don't use.

See documentation for details.

You can also use ib-pinyin if you only need Chinese pinyin match, which is simpler and more stable.

Usage

// cargo add ib-matcher --features pinyin,romaji
use ib_matcher::matcher::{IbMatcher, PinyinMatchConfig, RomajiMatchConfig};

let matcher = IbMatcher::builder("la vie est drôle").build();
assert!(matcher.is_match("LA VIE EST DRÔLE"));

let matcher = IbMatcher::builder("βίος").build();
assert!(matcher.is_match("Βίοσ"));
assert!(matcher.is_match("ΒΊΟΣ"));

let matcher = IbMatcher::builder("pysousuoeve")
    .pinyin(PinyinMatchConfig::default())
    .build();
assert!(matcher.is_match("拼音搜索Everything"));

let matcher = IbMatcher::builder("konosuba")
    .romaji(RomajiMatchConfig::default())
    .is_pattern_partial(true)
    .build();
assert!(matcher.is_match("この素晴らしい世界に祝福を"));

glob()-style pattern matching

See glob module for more details. Here is a quick example:

// cargo add ib-matcher --features syntax-glob,regex,romaji
use ib_matcher::{
    matcher::MatchConfig,
    regex::lita::Regex,
    syntax::glob::{parse_wildcard_path, PathSeparator}
};

let re = Regex::builder()
    .ib(MatchConfig::builder().romaji(Default::default()).build())
    .build_from_hir(
        parse_wildcard_path()
            .separator(PathSeparator::Windows)
            .call("wifi**miku"),
    )
    .unwrap();
assert!(re.is_match(r"C:\Windows\System32\ja-jp\WiFiTask\ミク.exe"));

Regular expression

See regex module for more details. Here is a quick example:

// cargo add ib-matcher --features regex,pinyin,romaji
use ib_matcher::{
    matcher::{MatchConfig, PinyinMatchConfig, RomajiMatchConfig},
    regex::{cp::Regex, Match},
};

let config = MatchConfig::builder()
    .pinyin(PinyinMatchConfig::default())
    .romaji(RomajiMatchConfig::default())
    .build();

let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("raki.suta")
    .unwrap();
assert_eq!(re.find("「らき☆すた」"), Some(Match::must(0, 3..18)));

let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("pysou.*?(any|every)thing")
    .unwrap();
assert_eq!(re.find("拼音搜索Everything"), Some(Match::must(0, 0..22)));

let config = MatchConfig::builder()
    .pinyin(PinyinMatchConfig::default())
    .romaji(RomajiMatchConfig::default())
    .mix_lang(true)
    .build();
let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("(?x)^zangsounofuri-?ren # Mixing pinyin and romaji")
    .unwrap();
assert_eq!(re.find("葬送のフリーレン"), Some(Match::must(0, 0..24)));

Custom matching callbacks:

// cargo add ib-matcher --features regex,regex-callback
use ib_matcher::regex::cp::Regex;

let re = Regex::builder()
    .callback("ascii", |input, at, push| {
        let haystack = &input.haystack()[at..];
        if haystack.len() > 0 && haystack[0].is_ascii() {
            push(1);
        }
    })
    .build(r"(ascii)+\d(ascii)+")
    .unwrap();
let hay = "that4U this4me";
assert_eq!(&hay[re.find(hay).unwrap().span()], " this4me");

Test

cargo build
cargo test --features pinyin,romaji