amt-phonetic 1.0.0

Articulatory Moment Transform — language-agnostic phonetic name matching
Documentation

amt-phonetic (Rust)

Crates.io Docs.rs License: MIT

Articulatory Moment Transform — language-agnostic phonetic token matching.

Crate is published as amt-phonetic; the library is imported as amt.

Designed and benchmarked for personal names across Latin, Arabic, CJK, Cyrillic, Devanagari, and Hebrew scripts. The core encoder generalizes to other short tokens (places, brands, drugs); see the top-level README for the caveats around the name-specific preprocessing (ال / AL-EL-UL-AS-ES prefix stripping, silent trailing H).

[dependencies]
amt-phonetic = "1.0"
use amt::{encode_token, matches, similarity, BKTree};

assert!(matches("Khaled", "Khalid"));
assert!(matches("Khaled", "خالد"));            // Latin ↔ Arabic
assert!(matches("Gamal", "Jamal"));            // Egyptian ↔ Standard
assert!(!matches("Khaled", "Robert"));

let s: f32 = similarity("Khaled Sameer", "khaled samir"); // ≈ 1.0

let mut tree: BKTree<String> = BKTree::new();
for name in &customer_names {
    let code = encode_token(name);
    for &sp in &code.spectrals {
        tree.add(sp, name.clone());
    }
}

let q = encode_token("Khaleed");
let hits = tree.query(q.spectrals[0], 4);

Features

flag default what it does
smallvec on Stack-allocate small class / spectral / bloom tuples.

Disable with default-features = false if you cannot pull in smallvec.

API surface

Item Purpose
encode_token(s) Encode one token → Code { spectrals, blooms, .. }
encode(name) Encode multi-token name → Vec<Code>
matches(a, b) Boolean phonetic match
similarity(a, b) -> f32 Graded similarity in [0, 1]
token_distance(&a, &b) Token-level distance
BKTree<T> Metric tree for radius-bounded fuzzy search
Code / class_of(c) Inspect raw fingerprints / sonority class of a char

Benchmarks

Run the in-tree throughput benchmark (Criterion):

cargo bench

End-to-end corpus + recall benchmarks (require regenerated data — see ../benchmarks/README.md):

cargo run --release --example bench_corpus
cargo run --release --example bench_recall

Algorithm

See the whitepaper for the full mathematical treatment, sonority classes, and head-to-head recall numbers vs Soundex, Metaphone, NYSIIS, Beider-Morse, and friends.

License

MIT — see LICENSE.