memory-indexer
In-memory multilingual full-text indexer with pinyin-first search, prefix and fuzzy recall—built for chat memory, note-taking, or local knowledge bases.
Highlights
-
Out-of-the-box CJK support
-
chinese and pinyin fuzzy search
-
japanese/korean n-grams with custom dictionaries
-
mixed-script text supported
-
-
Ranking and routing
-
BM25 with minimum-should-match
-
ASCII queries auto-route exact → pinyin → fuzzy
-
non-ASCII uses 2/3-gram + Levenshtein fuzzy
-
-
Highlight-friendly offsets: UTF-8/UTF-16 positions supported
-
Index snapshots: compressed binary format for persistence and fast loading
-
Pluggable dictionaries: inject or train Japanese/Hangul dictionaries for better tokenization
Quick start
use ;
let mut index = default;
index.add_doc;
index.add_doc;
// Auto chooses between exact / pinyin / fuzzy
let hits = index.search_hits;
// Explicit modes
let fuzzy = index.search_with_mode;
let pinyin_prefix = index.search_with_mode_hits;
// Highlight spans (UTF-16 positions by default)
let spans = index.get_matches;
// Snapshot persistence
let snapshot = index.get_snapshot_data.unwrap;
// index.load_snapshot("kb", snapshot);
Development
- Tests:
cargo test - Benchmarks:
cargo bench
License
AGPL-3.0-or-later