Skip to main content

Crate gramdex

Crate gramdex 

Source
Expand description

gramdex: k-gram indexing primitives for approximate string matching.

This crate is about candidate generation for fuzzy matching:

  • build an index mapping grams -> candidate document ids (or string ids)
  • query by grams to get a bounded candidate set
  • verify candidates with an exact checker (edit distance / substring / etc.)

Tokenization policy for grams matters. This crate provides a Unicode-scalar (Rust char) k-gram helper as a safe default. Callers can supply their own gram stream if they need byte-grams or grapheme clusters.

Structs§

GramDex
A minimal grams->docs candidate index.
PlannerConfig
Configuration for candidate planning / bailout.

Enums§

CandidatePlan
Planner output for candidate generation.
Error
Errors for gram indexing.

Functions§

char_kgrams
Produce Unicode-scalar k-grams (sliding window over Rust char).
char_trigrams
Produce Unicode-scalar trigrams (a convenience wrapper over char_kgrams).
trigram_jaccard
Exact trigram Jaccard similarity over Unicode-scalar trigrams.

Type Aliases§

DocId
Document id type.