Crate ib_matcher

Source
Expand description

A multilingual and fast string matcher, supports 拼音匹配 (Chinese pinyin match) and ローマ字検索 (Japanese romaji match).

§Usage

//! cargo add ib-matcher --features pinyin,romaji
use ib_matcher::{
    matcher::{IbMatcher, PinyinMatchConfig, RomajiMatchConfig},
    pinyin::PinyinNotation,
};

let matcher = IbMatcher::builder("pysousuoeve")
    .pinyin(PinyinMatchConfig::notations(
        PinyinNotation::Ascii | PinyinNotation::AsciiFirstLetter,
    ))
    .build();
assert!(matcher.is_match("拼音搜索Everything"));

let matcher = IbMatcher::builder("konosuba")
    .romaji(RomajiMatchConfig::default())
    .is_pattern_partial(true)
    .build();
assert!(matcher.is_match("この素晴らしい世界に祝福を"));

§Performance

The following Cargo.toml settings are recommended if best performance is desired:

[profile.release]
lto = "fat"
codegen-units = 1

These can improve the performance by 5~10% at most.

§Features

  • pinyin — Chinese pinyin match support.

  • romaji — Japanese romaji match support.

    The dictionary will take ~4.8 MiB (5.5 MiB without compression) in the binary at the moment, much larger than pinyin’s.

  • romaji-compress-words (enabled by default) — Binary size (and memory usage) -696 KiB (771 KiB if zstd is already used), romanizer build time +1.1 ms.

  • syntax — Pattern syntax support. See syntax for details.

  • perf (enabled by default) — Enables all performance related features. This feature is enabled by default is intended to cover all reasonable features that improve performance, even if more are added in the future.

  • perf-unicode-case-map (enabled by default) — -37% match time, +38 KiB

  • regex — Not used at the moment.

    Build size +837.5 KiB

  • inmut-data — Make pinyin::PinyinData interior mutable. So it can be easily used as a static variable.

  • minimal — Minimal APIs that can be used in one call. See minimal for details.

  • encoding — Support for non-UTF-8 encodings. Only UTF-16 and UTF-32 at the moment.

    Non-UTF-8 Japanese romaji match is not yet supported.

Re-exports§

pub use ib_romaji as romaji;romaji

Modules§

matcher
minimalminimal
Minimal APIs
pinyinpinyin
Pinyin
syntaxsyntax
Parse a pattern according to the syntax used by IbEverythingExt.
unicode