gramdex 0.1.0

k-gram / trigram indexing primitives for approximate string matching.
Documentation
# gramdex

`gramdex` provides small, dependency-light primitives for approximate string matching:

- Unicode-scalar (Rust `char`) \(k\)-gram generation
- A minimal grams → document-ids index (`GramDex`) for candidate generation
- An exact (verification) trigram Jaccard helper (`trigram_jaccard`)

## Quickstart

```toml
[dependencies]
gramdex = "0.1.0"
```

```rust
use gramdex::{GramDex, trigram_jaccard};

let mut ix = GramDex::new();
ix.add_document_trigrams(1, "hello");
ix.add_document_trigrams(2, "yellow");

let candidates = ix.candidates_union_trigrams("mellow");
let mut verified: Vec<u32> = candidates
    .into_iter()
    .filter(|&doc| match doc {
        1 => trigram_jaccard("mellow", "hello") >= 0.2,
        2 => trigram_jaccard("mellow", "yellow") >= 0.2,
        _ => false,
    })
    .collect();
verified.sort_unstable();
assert_eq!(verified, vec![2]);
```

## Best starting points

- **Gram generation**: `char_kgrams` / `char_trigrams`
- **Candidate index**: `GramDex` (union candidates, scored candidates, bailout planning)
- **Verification**: `trigram_jaccard`

## Design notes

- This crate focuses on **candidate generation**; you bring your own verification policy.
- Offsets/spans are naturally expressed in **Unicode scalar values** (`char` count), not bytes.

## License

Licensed under either of:

- Apache License, Version 2.0 (`LICENSE-APACHE`)
- MIT license (`LICENSE-MIT`)

at your option.