gramdex 0.1.1

k-gram / trigram indexing primitives for approximate string matching.
Documentation
  • Coverage
  • 100%
    31 out of 31 items documented6 out of 24 items with examples
  • Size
  • Source code size: 33.4 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.59 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 17s Average build duration of successful builds.
  • all releases: 17s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • arclabs561/gramdex
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • arclabs561

gramdex

crates.io Documentation CI

gramdex provides small, dependency-light primitives for approximate string matching:

  • Unicode-scalar (Rust char) (k)-gram generation
  • A minimal grams → document-ids index (GramDex) for candidate generation
  • An exact (verification) trigram Jaccard helper (trigram_jaccard)

Quickstart

[dependencies]
gramdex = "0.1.0"
use gramdex::{GramDex, trigram_jaccard};

let mut ix = GramDex::new();
ix.add_document_trigrams(1, "hello");
ix.add_document_trigrams(2, "yellow");

let candidates = ix.candidates_union_trigrams("mellow");
let mut verified: Vec<u32> = candidates
    .into_iter()
    .filter(|&doc| match doc {
        1 => trigram_jaccard("mellow", "hello") >= 0.2,
        2 => trigram_jaccard("mellow", "yellow") >= 0.2,
        _ => false,
    })
    .collect();
verified.sort_unstable();
assert_eq!(verified, vec![2]);

Best starting points

  • Gram generation: char_kgrams / char_trigrams
  • Candidate index: GramDex (union candidates, scored candidates, bailout planning)
  • Verification: trigram_jaccard

Design notes

  • This crate focuses on candidate generation; you bring your own verification policy.
  • Offsets/spans are naturally expressed in Unicode scalar values (char count), not bytes.

License

Licensed under either of:

  • Apache License, Version 2.0 (LICENSE-APACHE)
  • MIT license (LICENSE-MIT)

at your option.