amt-phonetic (Rust)

Articulatory Moment Transform — language-agnostic phonetic token matching.

Crate is published as amt-phonetic; the library is imported as amt.

Designed and benchmarked for personal names across Latin, Arabic, CJK, Cyrillic, Devanagari, and Hebrew scripts. The core encoder generalizes to other short tokens (places, brands, drugs); see the top-level README for the caveats around the name-specific preprocessing (ال / AL-EL-UL-AS-ES prefix stripping, silent trailing H).

[dependencies]
amt-phonetic = "1.0"

use amt::{encode_token, matches, similarity, BKTree};

assert!(matches("Khaled", "Khalid"));
assert!(matches("Khaled", "خالد"));            // Latin ↔ Arabic
assert!(matches("Gamal", "Jamal"));            // Egyptian ↔ Standard
assert!(!matches("Khaled", "Robert"));

let s: f32 = similarity("Khaled Sameer", "khaled samir"); // ≈ 1.0

let mut tree: BKTree<String> = BKTree::new();
for name in &customer_names {
    let code = encode_token(name);
    for &sp in &code.spectrals {
        tree.add(sp, name.clone());
    }
}

let q = encode_token("Khaleed");
let hits = tree.query(q.spectrals[0], 4);

Features

flag	default	what it does
`smallvec`	on	Stack-allocate small class / spectral / bloom tuples.

Disable with default-features = false if you cannot pull in smallvec.

API surface

Item	Purpose
`encode_token(s)`	Encode one token → `Code { spectrals, blooms, .. }`
`encode(name)`	Encode multi-token name → `Vec<Code>`
`matches(a, b)`	Boolean phonetic match
`similarity(a, b) -> f32`	Graded similarity in `[0, 1]`
`token_distance(&a, &b)`	Token-level distance
`BKTree<T>`	Metric tree for radius-bounded fuzzy search
`Code` / `class_of(c)`	Inspect raw fingerprints / sonority class of a char

Benchmarks

Run the in-tree throughput benchmark (Criterion):

cargo bench

End-to-end corpus + recall benchmarks (require regenerated data — see ../benchmarks/README.md):

cargo run --release --example bench_corpus
cargo run --release --example bench_recall

Algorithm

See the whitepaper for the full mathematical treatment, sonority classes, and head-to-head recall numbers vs Soundex, Metaphone, NYSIIS, Beider-Morse, and friends.

License