Expand description
gramdex: k-gram indexing primitives for approximate string matching.
This crate is about candidate generation for fuzzy matching:
- build an index mapping grams -> candidate document ids (or string ids)
- query by grams to get a bounded candidate set
- verify candidates with an exact checker (edit distance / substring / etc.)
Tokenization policy for grams matters. This crate provides a Unicode-scalar
(Rust char) k-gram helper as a safe default. Callers can supply their own
gram stream if they need byte-grams or grapheme clusters.
Structs§
- GramDex
- A minimal grams->docs candidate index.
- Planner
Config - Configuration for candidate planning / bailout.
Enums§
- Candidate
Plan - Planner output for candidate generation.
- Error
- Errors for gram indexing.
Functions§
- char_
kgrams - Produce Unicode-scalar k-grams (sliding window over Rust
char). - char_
trigrams - Produce Unicode-scalar trigrams (a convenience wrapper over
char_kgrams). - trigram_
jaccard - Exact trigram Jaccard similarity over Unicode-scalar trigrams.
Type Aliases§
- DocId
- Document id type.